Home » Équipes » MLIA » Publications

Publications

  • Yihong Liu, Mingyang Wang, Amir Hossein Kargaran, Ayyoob Imani, Orgest Xhelili, et al.. How Transliterations Improve Crosslingual Alignment. The 31st International Conference on Computational Linguistics (COLING), Jan 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04909505⟩
  • Paul Lerner, François Yvon. Towards the Machine Translation of Scientific Neologisms. Rapport D2-3.1, ISIR, Université Pierre et Marie Curie UMR CNRS 7222. 2025. ⟨hal-04852293⟩
  • Ziqian Peng, Rachel Bawden, François Yvon. Investigating Length Issues in Document-level Machine Translation. 2024. ⟨hal-04906015⟩
  • Mohamed Salim Aissi, Clément Romac, Thomas Carta, Sylvain Lamprier, Pierre-Yves Oudeyer, et al.. Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting. 2024. ⟨hal-04844077⟩
  • Paul Lerner, François Yvon. Towards the Machine Translation of Scientific Neologisms. 2024. ⟨hal-04835653v2⟩
  • Loris Gaven, Clément Romac, Thomas Carta, Sylvain Lamprier, Olivier Sigaud, et al.. SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling. IMOL 2024 - Intrinsically Motivated Open-ended Learning (Workshop at Neurips), Dec 2024, Vancouver, Canada. 2024. ⟨hal-04844089⟩
  • Paul Lerner, François Yvon. Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs. 2024. ⟨hal-04831106⟩
  • Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages. International Conference on Neural Information Systems (NeurIPS) - Benchmarks and Dataset Track, Dec 2024, Vancouver, Canada. ⟨hal-04830151⟩
  • Yannis Karmim, Marc Lafon, Raphaël Fournier-S'Niehotta, Nicolas Thome. Supra-Laplacian Encoding for Transformer on Dynamic Graphs. The Thirty-eighth Annual Conference on Neural Information Processing Systems, Dec 2024, Vancouver (CA), Canada. ⟨hal-04785441⟩
  • Nicolas Perrin-Gilbert. Ingredients for Motion Planning-powered Reinforcement Learning. Computer Science [cs]. Sorbonne université, 2024. ⟨tel-04927374⟩
  • Nicolas Dahan, Rachel Bawden, François Yvon. Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level. Inria Paris, Sorbonne Université; Sorbonne Universite; Inria Paris. 2024. ⟨hal-04798759⟩
  • Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun 2024, Seattle, United States. pp.1539-1550, ⟨10.1109/CVPRW63382.2024.00161⟩. ⟨hal-04791285⟩
  • Laura Nguyen, Benjamin Piwowarski, Julio Laborde, Gilles Moyse. Learning Reading Order via Document Layout with Layout2Pos. Linking Theory and Practice of Digital Libraries, Sep 2024, Ljubbljana, Slovenia. pp.3-19, ⟨10.1007/978-3-031-72437-4_1⟩. ⟨hal-04718874⟩
  • João Maria Janeiro, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault. MEXMA: Token-level objectives improve sentence representations. 2024. ⟨hal-04788199⟩
  • Amir Hossein Kargaran, François Yvon, Hinrich Schütze. MaskLID: Code-Switching Language Identification through Iterative Masking. 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.459-469. ⟨hal-04670790⟩
  • Mathias Vast, Basile van Cooten, Laure Soulier, Benjamin Piwowarski. Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders. ICTIR '24: The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval, Jul 2024, Washington DC, United States. pp.133-143, ⟨10.1145/3664190.3672528⟩. ⟨hal-04668348⟩
  • Sadaf Abdul Rauf, François Yvon. Translating scientific abstracts in the bio-medical domain with structure-aware models. Computer Speech and Language, 2024, 87, pp.101623. ⟨10.1016/j.csl.2024.101623⟩. ⟨hal-04476788⟩
  • Tanguy Herserant, Tristan Luiggi, Laure Soulier, Vincent Guigue. MeLaSSS : Métrique dans l’espace latent sur les phrases simplifiées. EvalLLM2024 - Atelier sur l'évaluation des modèles génératifs (LLM) et challence d'extraction d'information few-shot, AMIAD, Ministères des Armées, Jul 2024, Toulouse, France. ⟨hal-04678042⟩
  • Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Evaluer BLOOM en français. EvalLLM2024 - Atelier sur l'évaluation des modèles génératifs (LLM) et challenge d'extraction d'information few-shot, AMIAD, Ministères des Armées, Jul 2024, Toulouse, France. ⟨hal-04678039⟩
  • Yannis Karmim, Elias Ramzi, Raphaël Fournier-S 'Niehotta, Nicolas Thome. ITEM: Improving Training and Evaluation of Message-Passing based GNNs for top-k recommendation. Transactions on Machine Learning Research Journal, In press. ⟨hal-04645098⟩
  • Matthieu Cord. Vision & Language with transformers. CAp (Conférence sur l'Apprentissage automatique) and RFIAP (Reconnaissance des Formes, Image, Apprentissage et Perception) 2024, Jul 2024, Lille, France. ⟨hal-04634976⟩
  • Rachel Bawden, Ziqian Peng, Maud Bénard, Eric Villemonte de La Clergerie, Raphaël Esamotunu, et al.. Translate your Own: a Post-Editing Experiment in the NLP domain. The 25th Annual Conference of the European Association for Machine Translation, European Association for Machine Translation, Jun 2024, Sheffield, United Kingdom. ⟨hal-04573922⟩
  • Maxime Bouthors, Josep Crego, François Yvon. Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Association for Computational Linguistics, Jun 2024, Mexico, Mexico. pp.3022-3039, ⟨10.18653/v1/2024.findings-naacl.190⟩. ⟨hal-04670614⟩
  • Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, et al.. PETRA: Parallel End-to-end Training with Reversible Architectures. 2024. ⟨hal-04594647⟩
  • Ziqian Peng, Rachel Bawden, François Yvon. Handling Very Long Contexts in Neural Machine Translation: a Survey. Livrable D3-2.1, Projet ANR MaTOS. 2024, pp.50. ⟨hal-04652584v2⟩
  • Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, et al.. ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training. 2024. ⟨hal-04592562⟩
  • Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotScript: A Resource and Tool for Low Resource Writing System Identification. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA Language Resources Association (ELRA); International Committee on Computational Linguistics (ICCL), May 2024, Torino, Italy. ⟨hal-04587980⟩
  • Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant. Towards Effective and Efficient Sparse Neural Information Retrieval. ACM Transactions on Information Systems, 2024, 42 (5), pp.1-46. ⟨10.1145/3634912⟩. ⟨hal-04787990⟩
  • Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024. ⟨hal-04788197⟩
  • Nicolas Perrin-Gilbert. AFU: Actor-Free critic Updates in off-policy RL for continuous control. 2024. ⟨hal-04822857⟩
  • Etienne Le Naour, Louis Serrano, Léon Migus, Yuan Yin, Ghislain Agoua, et al.. Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations. Transactions on Machine Learning Research Journal, 2024, ⟨10.48550/arXiv.2306.05880⟩. ⟨hal-04759780⟩
  • Yuxuan Zong, Benjamin Piwowarski. Structured representation for Information Retrieval. COnférence en Recherche d'Informations et Applications, Apr 2024, La Rochelle, France. ⟨10.24348/coria.2024.abstract_24⟩. ⟨hal-04788243⟩
  • Emanuele Dalsasso, Clément Rambour, Nicolas Trouvé, Nicolas Thome. MERLIN-Seg: self-supervised despeckling for label-efficient semantic segmentation. Computer Vision and Image Understanding, 2024, 241, ⟨10.1016/j.cviu.2024.103940⟩. ⟨hal-04163624v2⟩
  • Manuel Faysse, Patrick Fernandes, Nuno Guerreiro, Antonio Loison, Duarte Alves, et al.. CroissantLLM: A Truly Bilingual French-English Language Model. 2024. ⟨hal-04574908⟩
  • Mathias Vast, Yuxuan Zong, Benjamin Piwowarski, Laure Soulier. Simple Domain Adaptation for Sparse Retrievers. Advances in Information Retrieval, 14610, Springer Nature Switzerland, pp.403-412, 2024, Lecture Notes in Computer Science, ⟨10.1007/978-3-031-56063-7_32⟩. ⟨hal-04517668⟩
  • Raphaël Mouravieff, Benjamin Piwowarski, Sylvain Lamprier. Training Table Question Answering via SQL Query Decomposition. 2024. ⟨hal-04788185⟩
  • Alexandre Chenu, Olivier Serris, Olivier Sigaud, Nicolas Perrin-Gilbert. Single-Reset Divide & Conquer Imitation Learning. 2024. ⟨hal-04822877⟩
  • Vaynee Sungeelee, Antoine Loriette, Olivier Sigaud, Baptiste Caramiaux. Interactive curriculum learning increases and homogenizes motor smoothness. Scientific Reports, 2024, ⟨10.1038/s41598-024-53253-3⟩. ⟨hal-04529557⟩
  • Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Les modèles Bloom pour le traitement automatique de la langue française. 2024. ⟨hal-04435371⟩
  • Noémie Jacquet, Vincent Guigue, Cristina Manfredotti, Fatiha Saïs, Stéphane Dervaux, et al.. Modélisation du caractère séquentiel des repas pour améliorer la performance d'un système de recommandation alimentaire. Extraction et Gestion des Connaissances (EGC 2024), Jan 2024, Dijon, France. ⟨hal-04440140⟩
  • Rémy Sun, Clément Masson, Gilles Hénaff, Nicolas Thome, Matthieu Cord. Semantic augmentation by mixing contents for semi-supervised learning. Pattern Recognition, 2024, 145, pp.109909. ⟨10.1016/j.patcog.2023.109909⟩. ⟨hal-04385089⟩
  • Paul Lerner, François Yvon. Vers la traduction automatique des néologismes scientifiques. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.245-261. ⟨hal-04623021⟩
  • Ziqian Peng, Rachel Bawden, François Yvon. À propos des difficultés de traduire automatiquement de longs documents. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.2-21. ⟨hal-04623006⟩
  • Florian Le Bronnec, Song Duong, Alexandre Allauzen, Vincent Guigue, Alberto Lumbreras, et al.. LOCOST: Modèles Espace-État pour le Résumé Abstractif de Documents Longs. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.11-11. ⟨hal-04622998⟩
  • Raphaël Mouravieff, Benjamin Piwowarski, Sylvain Lamprier. Learning Relational Decomposition of Queries for Question Answering from Tables. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024, Bangkok, Thailand. pp.10471--10485. ⟨hal-04677411⟩
  • Maxime Bouthors, Josep Crego, François Yvon. Optimiser le choix des exemples pour la traduction automatique augmentée par des mémoires de traduction. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.582-604. ⟨hal-04623042⟩
  • Bérengère Podvin, L. Soucasse, F. Yvon. Analysis of Rayleigh-Bénard convection using latent Dirichlet allocation. Physical Review Fluids, 2024, 9 (6), pp.063502. ⟨10.1103/PhysRevFluids.9.063502⟩. ⟨hal-04729077⟩
  • Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome. GalLoP: Learning Global and Local Prompts for Vision-Language Models. The 18th European Conference on Computer Vision ECCV 2024, Sep 2024, Milan, Italy. ⟨10.48550/arXiv.2407.01400⟩. ⟨hal-04635800⟩
  • Yannis Karmim, Leshanshui Yang, Raphaël Fournier-S'Niehotta, Clément Chatelain, Sébastien Adam, et al.. Temporal receptive field in dynamic graph learning: A comprehensive analysis. MLG Workshop at ECML-PKDD, Sep 2024, Vilnius (Lituanie), France. ⟨hal-04647025v2⟩
  • Mustafa Shukor, Nicolas Thome, Matthieu Cord. Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval. Computer Vision and Image Understanding, 2024, 247. ⟨hal-04743466⟩