MLIA Team publications

Amir Hossein Kargaran, Yihong Liu, François Yvon, Hinrich Schuetze. How Programming Concepts and Neurons Are Shared in Code Language Models. Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025, Vienna, Austria. pp.26905-26917, ⟨10.18653/v1/2025.findings-acl.1379⟩. ⟨hal-05209663⟩

[ HTTP | PDF ]

Dávid Javorský, Ondřej Bojar, François Yvon. Prompting LLMs: Length Control for Isometric Machine Translation. 22nd International Conference on Spoken Language Translation (IWSLT 2025), Jul 2025, Vienne, Austria. pp.119-137, ⟨10.18653/v1/2025.iwslt-1.11⟩. ⟨hal-05208907⟩

[ HTTP | PDF ]

Amir Hossein Kargaran, Ali Modarressi, Nafiseh Nikeghbal, Jana Diesner, François Yvon, et al.. MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment. Findings of the Association for Computational Linguistics: ACL 2025, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.27001-27023, ⟨10.18653/v1/2025.findings-acl.1385⟩. ⟨hal-05207048⟩

[ HTTP | PDF ]

Dávid Javorský, Ondřej Bojar, François Yvon. MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines. 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.16339-16356, ⟨10.18653/v1/2025.acl-long.797⟩. ⟨hal-05207042⟩

[ HTTP | PDF ]

Matthieu Dubois, François Yvon, Pablo Piantanida. MOSAIC: Multiple Observers Spotting AI Content. Findings of the Association for Computational Linguistics: ACL 2025, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.24230-24247, ⟨10.18653/v1/2025.findings-acl.1244⟩. ⟨hal-05207044⟩

[ HTTP | PDF ]

Renhao Pei, Yihong Liu, Peiqin Lin, François Yvon, Hinrich Schütze. Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu. 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.8767--8788. ⟨hal-05188526⟩

[ HTTP | PDF ]

Mathias Vast, Basile van Cooten, Laure Soulier, Benjamin Piwowarski. Understanding Matching Mechanisms in Cross-Encoders. Workshop on Explainability in Information Retrieval, Jul 2025, Padova, Italy, Italy. ⟨hal-05122957v1⟩

[ HTTP | PDF ]

Maëlis Morier, Jairo Rodríguez-Padilla, Maxime Sermesant, Patrick Gallinari. Learning Cardiac Electrophysiology with Graph Neural Networks for Fast Data-driven Personalised Predictions. FIMH 2025 - 13th Functional Imaging and Modeling of the Heart International Conference, Radomír Chabiniok, Jun 2025, Dallas (TX), United States. pp.LNCS 15672 and LNCS 15673. ⟨hal-05114524v2⟩

[ HTTP | PDF ]

Yuxuan Zong, Benjamin Piwowarski. Towards Lossless Token Pruning in Late-Interaction Retrieval Models. The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2025, Padoue, Italy. 2025, 979-8-4007-1592-1/2025/07. ⟨10.1145/3726302.3730100⟩. ⟨hal-05037885⟩

[ HTTP | PDF ]

Lucas Schott, Elies Gherbi, Hatem Hajri, Sylvain Lamprier. Ensemble et Fusion d'Agents RL pour la Robustesse aux Attaques Adverses. Conférence sur l’Apprentissage automatique (CAp), Jun 2025, Dijon, France. ⟨hal-05071249⟩

[ HTTP ]

Ben Kabongo, Vincent Guigue, Pirmin Lemberger. Prédiction des préférences et génération de revue personnalisée basées sur les aspects et attention. Actes de CORIA-TALN-RJCRI-RECITAL 2025, Jun 2025, Marseille, France. pp.151-170. ⟨hal-05165021⟩

[ HTTP | PDF ]

Virgile Guéneau, Laurent Guillier, Cécile Berdous, Raphael Rubrice, Antoine Carlioz, et al.. Bottom-up assembly of beneficial multi-species biofilms targeting undesirable bacteria using 3D fluorescence imaging. XXX Congreso Sociedad Espanola de Microbiologia, SEM, Jun 2025, Jaen, Spain. ⟨hal-05036755⟩

[ HTTP ]

Virgile Guéneau, Laurent Guillier, Raphael Rubrice, Cécile Berdous, Sabit Ahmed, et al.. Live fluorescence confocal imaging to study microbial interaction in multi-species biofilms. Journée annuelle France BioImaging IDF Sud, France BioImaging, Jun 2025, Jouy-en-Josas, France. ⟨hal-05110654⟩

[ HTTP ]

Nicolas Castanet, Olivier Sigaud, Sylvain Lamprier. Imagine Beyond! Distributionally Robust Auto-Encoding for State Space Coverage in Online Reinforcement Learning. 2025. ⟨hal-05118820⟩

[ HTTP ]

Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, et al.. PETRA: Parallel End-to-end Training with Reversible Architectures. 2025. ⟨hal-04594647v2⟩

[ HTTP | PDF ]

Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, et al.. ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training. 2025. ⟨hal-04592562v2⟩

[ HTTP | PDF ]

Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, et al.. Learning a neural solver for parametric PDEs to enhance physics-informed methods. ICLR 2025 - Thirteenth International Conference on Learning Representations, Apr 2025, Singapour, Singapore. 2025. ⟨hal-05093905⟩

[ HTTP | PDF ]

Lise Le Boudec, Emmanuel de Bézenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, et al.. Learning a neural solver for parametric PDEs to enhance physics-informed methods. ICLR 2025 - Thirteenth International Conference on Learning Representations, Apr 2025, Singapour, Singapore. ⟨hal-05093943⟩

[ HTTP | PDF ]

Yihong Liu, Mingyang Wang, Amir Hossein Kargaran, Ayyoob Imani, Orgest Xhelili, et al.. How Transliterations Improve Crosslingual Alignment. The 31st International Conference on Computational Linguistics (COLING), Jan 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04909505⟩

[ HTTP | PDF ]

Paul Lerner, François Yvon. Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs. COLING 2025, Jan 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04831106⟩

[ HTTP | PDF ]

Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, et al.. Optimization of Rank Losses for Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, pp.1-12. ⟨10.1109/TPAMI.2025.3543846⟩. ⟨hal-04975847⟩

[ HTTP | PDF ]

Paul Lerner, François Yvon. Towards the Machine Translation of Scientific Neologisms. COLING 2025, 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04835653v2⟩

[ HTTP | PDF ]

Julian Agudelo, Vincent Guigue, Cristina Manfredotti, Hadrien Piot. Drought Forecasting Using a Hybrid Neural Architecture for Integrating Time Series and Static Data. Tackling Climate Change with Machine Learning workshop at ICLR 2025, Apr 2025, Singapore, Singapore. ⟨10.48550/arXiv.2504.05957⟩. ⟨hal-05132868⟩

[ HTTP | PDF ]

Paul Lerner, François Yvon. Towards the Machine Translation of Scientific Neologisms. Rapport D2-3.1, ISIR, Université Pierre et Marie Curie UMR CNRS 7222. 2025. ⟨hal-04852293⟩

[ HTTP | PDF ]

Mathias Vast, Basile van Cooten, Laure Soulier, Benjamin Piwowarski. Comprendre la Nature des Signaux de Correspondance dans les Modèles Neuronaux pour la RI. Conférence en Recherche d’Information et Applications, Association francophone de Recherche d’Information et Applications, Jun 2025, Marseille, France. ⟨hal-05122843⟩

[ HTTP | PDF ]

Ziqian Peng, Rachel Bawden, François Yvon. Investigating Length Issues in Document-level Machine Translation. 2024. ⟨hal-04906015⟩

[ HTTP ]

Mohamed Salim Aissi, Clément Romac, Thomas Carta, Sylvain Lamprier, Pierre-Yves Oudeyer, et al.. Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting. 2024. ⟨hal-04844077⟩

[ HTTP ]

Loris Gaven, Clément Romac, Thomas Carta, Sylvain Lamprier, Olivier Sigaud, et al.. SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling. IMOL 2024 - Intrinsically Motivated Open-ended Learning (Workshop at Neurips), Dec 2024, Vancouver, Canada. 2024. ⟨hal-04844089⟩

[ HTTP ]

Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages. International Conference on Neural Information Systems (NeurIPS) - Benchmarks and Dataset Track, Dec 2024, Vancouver, Canada. ⟨hal-04830151⟩

[ HTTP | PDF ]

Yannis Karmim, Marc Lafon, Raphaël Fournier-S'Niehotta, Nicolas Thome. Supra-Laplacian Encoding for Transformer on Dynamic Graphs. The Thirty-eighth Annual Conference on Neural Information Processing Systems, Dec 2024, Vancouver (CA), Canada. ⟨hal-04785441⟩

[ HTTP | PDF ]

Nicolas Perrin-Gilbert. Ingredients for Motion Planning-powered Reinforcement Learning. Computer Science [cs]. Sorbonne université, 2024. ⟨tel-04927374⟩

[ HTTP | PDF ]

Nicolas Dahan, Rachel Bawden, François Yvon. Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level. Inria Paris, Sorbonne Université; Sorbonne Universite; Inria Paris. 2024. ⟨hal-04798759⟩

[ HTTP | PDF ]

Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun 2024, Seattle, United States. pp.1539-1550, ⟨10.1109/CVPRW63382.2024.00161⟩. ⟨hal-04791285⟩

[ HTTP | PDF ]

Tristan Luiggi, Tanguy Herserant, Thong Tran, Laure Soulier, Vincent Guigue. CALM: Context Augmentation with Large Language Model for Named Entity Recognition. Linking Theory and Practice of Digital Libraries, 15177, Springer Nature Switzerland, pp.273-291, 2024, Lecture Notes in Computer Science, 978-3-031-72437-4. ⟨10.1007/978-3-031-72437-4_16⟩. ⟨hal-04764424⟩

[ HTTP | PDF ]

Laura Nguyen, Benjamin Piwowarski, Julio Laborde, Gilles Moyse. Learning Reading Order via Document Layout with Layout2Pos. Linking Theory and Practice of Digital Libraries, Sep 2024, Ljubbljana, Slovenia. pp.3-19, ⟨10.1007/978-3-031-72437-4_1⟩. ⟨hal-04718874⟩

[ HTTP | PDF ]

João Maria Janeiro, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault. MEXMA: Token-level objectives improve sentence representations. 2024. ⟨hal-04788199⟩

[ HTTP | PDF ]

Amir Hossein Kargaran, François Yvon, Hinrich Schütze. MaskLID: Code-Switching Language Identification through Iterative Masking. 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.459-469. ⟨hal-04670790⟩

[ HTTP | PDF ]

Mathias Vast, Basile van Cooten, Laure Soulier, Benjamin Piwowarski. Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders. ICTIR '24: The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval, Jul 2024, Washington DC, United States. pp.133-143, ⟨10.1145/3664190.3672528⟩. ⟨hal-04668348⟩

[ HTTP ]

Sadaf Abdul Rauf, François Yvon. Translating scientific abstracts in the bio-medical domain with structure-aware models. Computer Speech and Language, 2024, 87, pp.101623. ⟨10.1016/j.csl.2024.101623⟩. ⟨hal-04476788⟩

[ HTTP | PDF ]

Tanguy Herserant, Tristan Luiggi, Laure Soulier, Vincent Guigue. MeLaSSS : Métrique dans l’espace latent sur les phrases simplifiées. EvalLLM2024 - Atelier sur l'évaluation des modèles génératifs (LLM) et challence d'extraction d'information few-shot, AMIAD, Ministères des Armées, Jul 2024, Toulouse, France. ⟨hal-04678042⟩

[ HTTP | PDF ]

Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Evaluer BLOOM en français. EvalLLM2024 - Atelier sur l'évaluation des modèles génératifs (LLM) et challenge d'extraction d'information few-shot, AMIAD, Ministères des Armées, Jul 2024, Toulouse, France. ⟨hal-04678039⟩

[ HTTP | PDF ]

Matthieu Cord. Vision & Language with transformers. CAp (Conférence sur l'Apprentissage automatique) and RFIAP (Reconnaissance des Formes, Image, Apprentissage et Perception) 2024, Jul 2024, Lille, France. ⟨hal-04634976⟩

[ HTTP ]

Yannis Karmim, Elias Ramzi, Raphaël Fournier-S 'Niehotta, Nicolas Thome. ITEM: Improving Training and Evaluation of Message-Passing based GNNs for top-k recommendation. Transactions on Machine Learning Research Journal, In press. ⟨hal-04645098⟩

[ HTTP | PDF ]

Rachel Bawden, Ziqian Peng, Maud Bénard, Eric Villemonte de La Clergerie, Raphaël Esamotunu, et al.. Translate your Own: a Post-Editing Experiment in the NLP domain. The 25th Annual Conference of the European Association for Machine Translation, European Association for Machine Translation, Jun 2024, Sheffield, United Kingdom. ⟨hal-04573922⟩

[ HTTP | PDF ]

Maxime Bouthors, Josep Crego, François Yvon. Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Association for Computational Linguistics, Jun 2024, Mexico, Mexico. pp.3022-3039, ⟨10.18653/v1/2024.findings-naacl.190⟩. ⟨hal-04670614⟩

[ HTTP | PDF ]

Ziqian Peng, Rachel Bawden, François Yvon. Handling Very Long Contexts in Neural Machine Translation: a Survey. Livrable D3-2.1, Projet ANR MaTOS. 2024, pp.50. ⟨hal-04652584v2⟩

[ HTTP | PDF ]

Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotScript: A Resource and Tool for Low Resource Writing System Identification. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA Language Resources Association (ELRA); International Committee on Computational Linguistics (ICCL), May 2024, Torino, Italy. ⟨hal-04587980⟩

[ HTTP | PDF ]

Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant. Towards Effective and Efficient Sparse Neural Information Retrieval. ACM Transactions on Information Systems, 2024, 42 (5), pp.1-46. ⟨10.1145/3634912⟩. ⟨hal-04787990⟩

[ HTTP ]

Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024. ⟨hal-04788197⟩

[ HTTP | PDF ]

Nicolas Perrin-Gilbert. AFU: Actor-Free critic Updates in off-policy RL for continuous control. 2024. ⟨hal-04822857⟩

[ HTTP ]