Mustafa Shukor

Mustafa Shukor
PhD student
Team: MLIA

Publication year

Type of document

Journal articles
Conference paper
Book sections
These
Others

Publications

Mustafa Shukor. Efficient and scalable multimodal learning. Machine Learning [cs.LG]. Sorbonne Université, 2026. English. ⟨NNT : 2026SORUS035⟩. ⟨tel-05651880⟩
[ HTTP | PDF ]
Delong Chen, Mustafa Shukor, Theo Moutakanni, Willy Chung, Jade Yu, et al.. VL-JEPA: Joint Embedding Predictive Architecture for Vision-language. International Conference on Representation Learning, Apr 2026, Rio de Janeiro, Brazil. 2026. ⟨hal-05660881⟩
[ HTTP ]
Delong Chen, Tejaswi Kasarla, Yejin Bang, Mustafa Shukor, Willy Chung, et al.. Action100M: A Large-scale Video Action Dataset. Computer Vision and Pattern Recognition (CVPR) EgoVis Workshop, 2026. ⟨hal-05660896⟩
[ HTTP ]
Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Arnaud Dapogny, Alasdair Newson, et al.. Learning to Steer: Input-dependent Steering for Multimodal LLMs. 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Nov 2025, San Diego (CA), United States. ⟨hal-05462808⟩
[ HTTP | PDF ]
Pegah Khayatan, Mustafa Shukor, Jayneel Parekh, Arnaud Dapogny, Matthieu Cord. Analyzing Finetuning Representation Shift for Multimodal LLMs Steering. IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2025, Honoloulu, Hawaii, United States. ⟨hal-05462785⟩
[ HTTP | PDF ]
Paul Couairon, Mustafa Shukor, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome. DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut. Neural Information Processing Systems, Dec 2024, Vancouver, Canada. ⟨hal-05483233⟩
[ HTTP | PDF ]
Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Alasdair Newson, Matthieu Cord. A Concept-Based Explainability Framework for Large Multimodal Models. Advances in Neural Information Processing Systems 37, Dec 2024, Vancouver, Canada. pp.135783-135818, ⟨10.52202/079017-4312⟩. ⟨hal-05462746⟩
[ HTTP | PDF ]
Mustafa Shukor, Matthieu Cord. Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs. Advances in Neural Information Processing Systems (NeurIPS), Dec 2024, Vancouver, Canada. ⟨hal-04743447⟩
[ HTTP | PDF ]
Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun 2024, Seattle, United States. pp.1539-1550, ⟨10.1109/CVPRW63382.2024.00161⟩. ⟨hal-04791285⟩
[ HTTP | PDF ]
Mustafa Shukor, Alexandre Rame, Corentin Dancette, Matthieu Cord. Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning. The Twelfth International Conference on Learning Representations (ICLR), May 2024, Vienna, Austria. ⟨hal-04505149⟩
[ HTTP | PDF ]
Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024. ⟨hal-04788197⟩
[ HTTP | PDF ]
Mustafa Shukor, Nicolas Thome, Matthieu Cord. Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval. Computer Vision and Image Understanding, 2024, 247. ⟨hal-04743466⟩
[ HTTP | PDF ]
Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord. UnIVAL: Unified Model for Image, Video, Audio and Language Tasks. Transactions on Machine Learning Research Journal, 2023. ⟨hal-04366059⟩
[ HTTP | PDF ]
Mustafa Shukor, Corentin Dancette, Matthieu Cord. eP-ALM: Efficient Perceptual Augmentation of Language Models. International Conference on Computer Vision (ICCV23), Oct 2023, Paris, France. pp.22056-22069, ⟨10.48550/arXiv.2303.11403⟩. ⟨hal-04232603⟩
[ HTTP | PDF ]
Mustafa Shukor, Guillaume Couairon, Matthieu Cord. Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment. 33rd British Machine Vision Conference (BMVC), Nov 2022, London, United Kingdom. ⟨hal-03811336⟩
[ HTTP | PDF ]

Page