Home » Teams » MLIA » MLIA Team publications

MLIA Team publications

  • Yannis Karmim, Marc Lafon, Raphaël Fournier-S'Niehotta, Nicolas Thome. Supra-Laplacian Encoding for Transformer on Dynamic Graphs. The Thirty-eighth Annual Conference on Neural Information Processing Systems, Dec 2024, Vancouver (CA), Canada. ⟨hal-04785441⟩
  • Nicolas Dahan, Rachel Bawden, François Yvon. Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level. Inria Paris, Sorbonne Université; Sorbonne Universite; Inria Paris. 2024. ⟨hal-04798759⟩
  • Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun 2024, Seattle, United States. pp.1539-1550, ⟨10.1109/CVPRW63382.2024.00161⟩. ⟨hal-04791285⟩
  • Laura Nguyen, Benjamin Piwowarski, Julio Laborde, Gilles Moyse. Learning Reading Order via Document Layout with Layout2Pos. Linking Theory and Practice of Digital Libraries, Sep 2024, Ljubbljana, Slovenia. pp.3-19, ⟨10.1007/978-3-031-72437-4_1⟩. ⟨hal-04718874⟩
  • João Maria Janeiro, Benjamin Piwowarski, Patrick Gallinari, Loïc Barrault. MEXMA: Token-level objectives improve sentence representations. 2024. ⟨hal-04788199⟩
  • Amir Hossein Kargaran, François Yvon, Hinrich Schütze. MaskLID: Code-Switching Language Identification through Iterative Masking. 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.459-469. ⟨hal-04670790⟩
  • Mathias Vast, Basile van Cooten, Laure Soulier, Benjamin Piwowarski. Which Neurons Matter in IR? Applying Integrated Gradients-based Methods to Understand Cross-Encoders. ICTIR '24: The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval, Jul 2024, Washington DC, United States. pp.133-143, ⟨10.1145/3664190.3672528⟩. ⟨hal-04668348⟩
  • Sadaf Abdul Rauf, François Yvon. Translating scientific abstracts in the bio-medical domain with structure-aware models. Computer Speech and Language, 2024, 87, pp.101623. ⟨10.1016/j.csl.2024.101623⟩. ⟨hal-04476788⟩
  • Tanguy Herserant, Tristan Luiggi, Laure Soulier, Vincent Guigue. MeLaSSS : Métrique dans l’espace latent sur les phrases simplifiées. Atelier sur l'évaluation des modèles génératifs (LLM) et challence d'extraction d'information few-shot, Institut des sciences informatiques et de leurs interactions - CNRS Sciences informatiques [INS2I-CNRS], Jul 2024, Toulouse, France. ⟨hal-04678042⟩
  • Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Evaluer BLOOM en français. Atelier sur l'évaluation des modèles génératifs (LLM) et challenge d'extraction d'information few-shot, Institut des sciences informatiques et de leurs interactions - CNRS Sciences informatiques [INS2I-CNRS], Jul 2024, Toulouse, France. ⟨hal-04678039⟩
  • Yannis Karmim, Elias Ramzi, Raphaël Fournier-S 'Niehotta, Nicolas Thome. ITEM: Improving Training and Evaluation of Message-Passing based GNNs for top-k recommendation. Transactions on Machine Learning Research Journal, In press. ⟨hal-04645098⟩
  • Matthieu Cord. Vision & Language with transformers. CAp (Conférence sur l'Apprentissage automatique) and RFIAP (Reconnaissance des Formes, Image, Apprentissage et Perception) 2024, Jul 2024, Lille, France. ⟨hal-04634976⟩
  • Rachel Bawden, Ziqian Peng, Maud Bénard, Eric Villemonte de La Clergerie, Raphaël Esamotunu, et al.. Translate your Own: a Post-Editing Experiment in the NLP domain. The 25th Annual Conference of the European Association for Machine Translation, European Association for Machine Translation, Jun 2024, Sheffield, United Kingdom. ⟨hal-04573922⟩
  • Maxime Bouthors, Josep Crego, François Yvon. Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Association for Computational Linguistics, Jun 2024, Mexico, Mexico. pp.3022-3039, ⟨10.18653/v1/2024.findings-naacl.190⟩. ⟨hal-04670614⟩
  • Stéphane Rivaud, Louis Fournier, Thomas Pumir, Eugene Belilovsky, Michael Eickenberg, et al.. PETRA: Parallel End-to-end Training with Reversible Architectures. 2024. ⟨hal-04594647⟩
  • Ziqian Peng, Rachel Bawden, François Yvon. Handling Very Long Contexts in Neural Machine Translation: a Survey. Livrable D3-2.1, Projet ANR MaTOS. 2024, pp.50. ⟨hal-04652584v2⟩
  • Adel Nabli, Louis Fournier, Pierre Erbacher, Louis Serrano, Eugene Belilovsky, et al.. ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training. 2024. ⟨hal-04592562⟩
  • Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotScript: A Resource and Tool for Low Resource Writing System Identification. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA Language Resources Association (ELRA); International Committee on Computational Linguistics (ICCL), May 2024, Torino, Italy. ⟨hal-04587980⟩
  • Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stéphane Clinchant. Towards Effective and Efficient Sparse Neural Information Retrieval. ACM Transactions on Information Systems, 2024, 42 (5), pp.1-46. ⟨10.1145/3634912⟩. ⟨hal-04787990⟩
  • Folco Bertini Baldassini, Mustafa Shukor, Matthieu Cord, Laure Soulier, Benjamin Piwowarski. What Makes Multimodal In-Context Learning Work?. 2024. ⟨hal-04788197⟩
  • Etienne Le Naour, Louis Serrano, Léon Migus, Yuan Yin, Ghislain Agoua, et al.. Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations. Transactions on Machine Learning Research Journal, 2024, ⟨10.48550/arXiv.2306.05880⟩. ⟨hal-04759780⟩
  • Yuxuan Zong, Benjamin Piwowarski. Structured representation for Information Retrieval. COnférence en Recherche d'Informations et Applications, Apr 2024, La Rochelle, France. ⟨10.24348/coria.2024.abstract_24⟩. ⟨hal-04788243⟩
  • Emanuele Dalsasso, Clément Rambour, Nicolas Trouvé, Nicolas Thome. MERLIN-Seg: self-supervised despeckling for label-efficient semantic segmentation. Computer Vision and Image Understanding, 2024, 241, ⟨10.1016/j.cviu.2024.103940⟩. ⟨hal-04163624v2⟩
  • Manuel Faysse, Patrick Fernandes, Nuno Guerreiro, Antonio Loison, Duarte Alves, et al.. CroissantLLM: A Truly Bilingual French-English Language Model. 2024. ⟨hal-04574908⟩
  • Mathias Vast, Yuxuan Zong, Benjamin Piwowarski, Laure Soulier. Simple Domain Adaptation for Sparse Retrievers. Advances in Information Retrieval, 14610, Springer Nature Switzerland, pp.403-412, 2024, Lecture Notes in Computer Science, ⟨10.1007/978-3-031-56063-7_32⟩. ⟨hal-04517668⟩
  • Raphaël Mouravieff, Benjamin Piwowarski, Sylvain Lamprier. Training Table Question Answering via SQL Query Decomposition. 2024. ⟨hal-04788185⟩
  • Vaynee Sungeelee, Antoine Loriette, Olivier Sigaud, Baptiste Caramiaux. Interactive curriculum learning increases and homogenizes motor smoothness. Scientific Reports, 2024, ⟨10.1038/s41598-024-53253-3⟩. ⟨hal-04529557⟩
  • Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Les modèles Bloom pour le traitement automatique de la langue française. 2024. ⟨hal-04435371⟩
  • Noémie Jacquet, Vincent Guigue, Cristina Manfredotti, Fatiha Saïs, Stéphane Dervaux, et al.. Modélisation du caractère séquentiel des repas pour améliorer la performance d'un système de recommandation alimentaire. Extraction et Gestion des Connaissances (EGC 2024), Jan 2024, Dijon, France. ⟨hal-04440140⟩
  • Paul Lerner, François Yvon. Vers la traduction automatique des néologismes scientifiques. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.245-261. ⟨hal-04623021⟩
  • Rémy Sun, Clément Masson, Gilles Hénaff, Nicolas Thome, Matthieu Cord. Semantic augmentation by mixing contents for semi-supervised learning. Pattern Recognition, 2024, 145, pp.109909. ⟨10.1016/j.patcog.2023.109909⟩. ⟨hal-04385089⟩
  • Florian Le Bronnec, Song Duong, Alexandre Allauzen, Vincent Guigue, Alberto Lumbreras, et al.. LOCOST: Modèles Espace-État pour le Résumé Abstractif de Documents Longs. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.11-11. ⟨hal-04622998⟩
  • Ziqian Peng, Rachel Bawden, François Yvon. À propos des difficultés de traduire automatiquement de longs documents. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.2-21. ⟨hal-04623006⟩
  • Bérengère Podvin, L. Soucasse, F. Yvon. Analysis of Rayleigh-Bénard convection using latent Dirichlet allocation. Physical Review Fluids, 2024, 9 (6), pp.063502. ⟨10.1103/PhysRevFluids.9.063502⟩. ⟨hal-04729077⟩
  • Raphaël Mouravieff, Benjamin Piwowarski, Sylvain Lamprier. Learning Relational Decomposition of Queries for Question Answering from Tables. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024, Bangkok, Thailand. pp.10471--10485. ⟨hal-04677411⟩
  • Maxime Bouthors, Josep Crego, François Yvon. Optimiser le choix des exemples pour la traduction automatique augmentée par des mémoires de traduction. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.582-604. ⟨hal-04623042⟩
  • Marc Lafon, Elias Ramzi, Clément Rambour, Nicolas Audebert, Nicolas Thome. GalLoP: Learning Global and Local Prompts for Vision-Language Models. The 18th European Conference on Computer Vision ECCV 2024, Sep 2024, Milan, Italy. ⟨10.48550/arXiv.2407.01400⟩. ⟨hal-04635800⟩
  • Florian Le Bronnec, Song Duong, Mathieu Ravaut, Alexandre Allauzen, Nancy F. Chen, et al.. LOCOST: State-Space Models for Long Document Abstractive Summarization. European Chapter of the Association for Computational Linguistics (EACL), Mar 2024, St. Julian’s, Malta. ⟨hal-04438465⟩
  • Yannis Karmim, Leshanshui Yang, Raphaël Fournier-S'Niehotta, Clément Chatelain, Sébastien Adam, et al.. Temporal receptive field in dynamic graph learning: A comprehensive analysis. MLG Workshop at ECML-PKDD, Sep 2024, Vilnius (Lituanie), France. ⟨hal-04647025v2⟩
  • Vincent Guigue. Generative AI: tools & challenges. JOBIM, 2024, Tououse, France. ⟨hal-04792761⟩
  • Santiago Herrera, Caio Corro, Sylvain Kahane. Régression logistique parcimonieuse pour l'extraction automatique de règles de grammaire. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.211-218. ⟨hal-04623018⟩
  • François Yvon. La traduction multilingue : analyse d'une prouesse technologique. Mediazioni. Rivista online du studi interdisciplinari su lingue e culture, 2023, 39, pp.A17-A34. ⟨10.6092/issn.1974-4382/18785⟩. ⟨hal-04365112⟩
  • Léo Grinsztajn, Myung Jun Kim, Edouard Oyallon, Gaël Varoquaux. Vectorizing string entries for data processing on tables: when are larger language models better?. 2023. ⟨hal-04345931⟩
  • Skander Karkar, Ibrahim Ayed, Emmanuel de Bézenac, Patrick Gallinari. Module-wise Training of Neural Networks via the Minimizing Movement Scheme. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023), Dec 2023, New Orleans (Louisiana), United States. ⟨hal-04223364⟩
  • Adel Nabli, Eugene Belilovsky, Edouard Oyallon. $\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning. Thirty-seventh Conference on Neural Information Processing Systems, Dec 2023, New Orleans, United States. ⟨hal-04124318v2⟩
  • Shu Okabe, François Yvon. Towards Multilingual Interlinear Morphological Glossing. 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2023, Singapore, Singapore. pp.5958-5971, ⟨10.18653/v1/2023.findings-emnlp.396⟩. ⟨hal-04357157⟩
  • Gwen Legate, Nicolas Bernier, Lucas Caccia, Edouard Oyallon, Eugene Belilovsky. Guiding The Last Layer in Federated Learning with Pre-Trained Models. Neurips, In press. ⟨hal-04262471⟩
  • Maxime Bouthors, Josep Crego, François Yvon. Towards Example-Based NMT with Multi-Levenshtein Transformers. Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2023, Singapour, Singapore. pp.1830-1846. ⟨hal-04332427⟩
  • Edouard Oyallon. Contributions to Local, Asynchronous and Decentralized Learning, and to Geometric Deep Learning. Artificial Intelligence [cs.AI]. Sorbonne Université, 2023. ⟨tel-04334118⟩
  • Amir Hossein Kargaran, Ayyoob Imani, François Yvon, Hinrich Schütze. GlotLID: Language Identification for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Dec 2023, Singapore, Singapore. pp.6155-6218. ⟨hal-04332442⟩