
-
François Yvon
- Équipe: MLIA
- Bureau: H13
- Email: prénom point nom arobase sorbonne-universite point fr
- Site web: https://fyvo.github.io
- Bio: François Yvon est directeur de recherche au CNRS; il conduit ses recherches en traitement des langues au sein du groupe "Machine Learning and Deep Learning for Intelligent Access" de l'ISIR. Ses travaux récents s'intéressent à la traduction automatique par des méthodes neuronales et probabilistes - et plus généralement au traitement automatique des langues dans sa dimension multilingue. Antérieurement, F. Yvon a été directeur du LIMSI/CNRS à Orsay et professeur d'informatique à l'Université Paris-Sud et à Télécom Paris, et brièvement chercheur invité au centre de recherche T.J Watson IBM (NY).
Publications
- Paul Lerner, François Yvon. Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset. 2025. ⟨hal-05328251⟩
- Amir Hossein Kargaran, Yihong Liu, François Yvon, Hinrich Schuetze. How Programming Concepts and Neurons Are Shared in Code Language Models. Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025, Vienna, Austria. pp.26905-26917, ⟨10.18653/v1/2025.findings-acl.1379⟩. ⟨hal-05209663⟩
- Dávid Javorský, Ondřej Bojar, François Yvon. Prompting LLMs: Length Control for Isometric Machine Translation. 22nd International Conference on Spoken Language Translation (IWSLT 2025), Jul 2025, Vienne, Austria. pp.119-137, ⟨10.18653/v1/2025.iwslt-1.11⟩. ⟨hal-05208907⟩
- Amir Hossein Kargaran, Ali Modarressi, Nafiseh Nikeghbal, Jana Diesner, François Yvon, et al.. MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment. Findings of the Association for Computational Linguistics: ACL 2025, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.27001-27023, ⟨10.18653/v1/2025.findings-acl.1385⟩. ⟨hal-05207048⟩
- Dávid Javorský, Ondřej Bojar, François Yvon. MockConf: A Student Interpretation Dataset: Analysis, Word- and Span-level Alignment and Baselines. 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.16339-16356, ⟨10.18653/v1/2025.acl-long.797⟩. ⟨hal-05207042⟩
- Matthieu Dubois, François Yvon, Pablo Piantanida. MOSAIC: Multiple Observers Spotting AI Content. Findings of the Association for Computational Linguistics: ACL 2025, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.24230-24247, ⟨10.18653/v1/2025.findings-acl.1244⟩. ⟨hal-05207044⟩
- Renhao Pei, Yihong Liu, Peiqin Lin, François Yvon, Hinrich Schütze. Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu. 63rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Jul 2025, Vienna, Austria. pp.8767--8788. ⟨hal-05188526⟩
- Maxime Bouthors, Josep Crego, François Yvon. Améliorer la Traduction Neuronale par Exemple avec des Données Monolingues. 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), ATALA, Jul 2025, Marseille, France. ⟨hal-05316675⟩
- Matthieu Dubois, Pablo Piantanida, François Yvon. MOSAIC : Mélange d'experts pour la détection de textes artificiels. 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), ATALA, Jul 2025, Marseille, France. ⟨hal-05317590⟩
- Joanna Radoła, François Yvon. Alignements divisifs de textes parallèles: données, algorithme et évaluation. 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), ATALA, Jun 2025, Marseille, France. ⟨hal-05316630⟩
- Rachel Bawden, Maud Bénard, Eric Villemonte de La Clergerie, José Cornejo Cárcamo, Nicolas Dahan, et al.. MaTOS: Machine Translation for Open Science. 20th Machine Translation Summit, International Machine Translation Association, Jun 2025, Geneva, Switzerland. ⟨hal-05228687⟩
- Yihong Liu, Mingyang Wang, Amir Hossein Kargaran, Ayyoob Imani, Orgest Xhelili, et al.. How Transliterations Improve Crosslingual Alignment. The 31st International Conference on Computational Linguistics (COLING), Jan 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04909505⟩
- Paul Lerner, François Yvon. Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs. COLING 2025, Jan 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04831106⟩
- Paul Lerner, François Yvon. Towards the Machine Translation of Scientific Neologisms. COLING 2025, 2025, Abu Dhabi, United Arab Emirates. ⟨hal-04835653v2⟩
- Paul Lerner, François Yvon. Towards the Machine Translation of Scientific Neologisms. Rapport D2-3.1, ISIR, Université Pierre et Marie Curie UMR CNRS 7222. 2025. ⟨hal-04852293⟩
- Paul Lerner, Laurène Cave, Hal Daumé, Léo Labat, Gaël Lejeune, et al.. Comment mesurer les biais politiques des grands modèles de langue multilingues?. 20e Conférence en Recherche d’Information et Applications (CORIA) 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN) 27ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL) Les 18e Rencontres Jeunes Chercheurs en RI (RJCRI), 2025, Marseille, France. pp.1-7. ⟨hal-05324834⟩
- Ziqian Peng, Rachel Bawden, François Yvon. Investigating Length Issues in Document-level Machine Translation. Machine Translation Summit XX, European Machine Translation Association, Jun 2025, Geneva, Switzerland, Switzerland. pp.4-23. ⟨hal-04906015⟩
- Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages. International Conference on Neural Information Systems (NeurIPS) - Benchmarks and Dataset Track, Dec 2024, Vancouver, Canada. ⟨hal-04830151⟩
- Nicolas Dahan, Rachel Bawden, François Yvon. Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level. Inria Paris, Sorbonne Université; Sorbonne Universite; Inria Paris. 2024. ⟨hal-04798759⟩
- Amir Hossein Kargaran, François Yvon, Hinrich Schütze. MaskLID: Code-Switching Language Identification through Iterative Masking. 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Association for Computational Linguistics, Aug 2024, Bangkok, Thailand. pp.459-469. ⟨hal-04670790⟩
- Sadaf Abdul Rauf, François Yvon. Translating scientific abstracts in the bio-medical domain with structure-aware models. Computer Speech and Language, 2024, 87, pp.101623. ⟨10.1016/j.csl.2024.101623⟩. ⟨hal-04476788⟩
- Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Evaluer BLOOM en français. EvalLLM2024 - Atelier sur l'évaluation des modèles génératifs (LLM) et challenge d'extraction d'information few-shot, AMIAD, Ministères des Armées, Jul 2024, Toulouse, France. ⟨hal-04678039⟩
- Rachel Bawden, Ziqian Peng, Maud Bénard, Eric Villemonte de La Clergerie, Raphaël Esamotunu, et al.. Translate your Own: a Post-Editing Experiment in the NLP domain. The 25th Annual Conference of the European Association for Machine Translation, European Association for Machine Translation, Jun 2024, Sheffield, United Kingdom. ⟨hal-04573922⟩
- Maxime Bouthors, Josep Crego, François Yvon. Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison. 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Association for Computational Linguistics, Jun 2024, Mexico, Mexico. pp.3022-3039, ⟨10.18653/v1/2024.findings-naacl.190⟩. ⟨hal-04670614⟩
- Ziqian Peng, Rachel Bawden, François Yvon. Handling Very Long Contexts in Neural Machine Translation: a Survey. Livrable D3-2.1, Projet ANR MaTOS. 2024, pp.50. ⟨hal-04652584v2⟩
- Amir Hossein Kargaran, François Yvon, Hinrich Schütze. GlotScript: A Resource and Tool for Low Resource Writing System Identification. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA Language Resources Association (ELRA); International Committee on Computational Linguistics (ICCL), May 2024, Torino, Italy. ⟨hal-04587980⟩
- Manuel Faysse, Patrick Fernandes, Nuno Guerreiro, Antonio Loison, Duarte Alves, et al.. CroissantLLM: A Truly Bilingual French-English Language Model. 2024. ⟨hal-04574908⟩
- Rachel Bawden, Hatim Bourfoune, Bertrand Cabot, Nathan Cassereau, Pierre Cornette, et al.. Les modèles Bloom pour le traitement automatique de la langue française. 2024. ⟨hal-04435371⟩
- Ziqian Peng, Rachel Bawden, François Yvon. À propos des difficultés de traduire automatiquement de longs documents. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.2-21. ⟨hal-04623006⟩
- Paul Lerner, François Yvon. Vers la traduction automatique des néologismes scientifiques. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.245-261. ⟨hal-04623021⟩
- Maxime Bouthors, Josep Crego, François Yvon. Optimiser le choix des exemples pour la traduction automatique augmentée par des mémoires de traduction. 35èmes Journées d'Études sur la Parole (JEP 2024) 31ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2024) 26ème Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RECITAL 2024), Jul 2024, Toulouse, France. pp.582-604. ⟨hal-04623042⟩
- Bérengère Podvin, L. Soucasse, F. Yvon. Analysis of Rayleigh-Bénard convection using latent Dirichlet allocation. Physical Review Fluids, 2024, 9 (6), pp.063502. ⟨10.1103/PhysRevFluids.9.063502⟩. ⟨hal-04729077⟩
- François Yvon. La traduction multilingue : analyse d'une prouesse technologique. Mediazioni. Rivista online du studi interdisciplinari su lingue e culture, 2023, 39, pp.A17-A34. ⟨10.6092/issn.1974-4382/18785⟩. ⟨hal-04365112⟩
- Shu Okabe, François Yvon. Towards Multilingual Interlinear Morphological Glossing. 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2023, Singapore, Singapore. pp.5958-5971, ⟨10.18653/v1/2023.findings-emnlp.396⟩. ⟨hal-04357157⟩
- Maxime Bouthors, Josep Crego, François Yvon. Towards Example-Based NMT with Multi-Levenshtein Transformers. Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Dec 2023, Singapour, Singapore. pp.1830-1846. ⟨hal-04332427⟩
- Alban Petit, Caio Corro, François Yvon. Structural generalization in COGS: Supertagging is (almost) all you need. 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023, Singapour, Singapore. pp.1089-1101, ⟨10.18653/v1/2023.emnlp-main.69⟩. ⟨hal-04382463⟩
- Amir Hossein Kargaran, Ayyoob Imani, François Yvon, Hinrich Schütze. GlotLID: Language Identification for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Dec 2023, Singapore, Singapore. pp.6155-6218. ⟨hal-04332442⟩
- Shu Okabe, François Yvon. LISN @ SIGMORPHON 2023 Shared Task on Interlinear Glossing. The 20th SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics, Association for computational linguistics, Jul 2023, Toronto, Canada. ⟨10.18653/v1/2023.sigmorphon-1.21⟩. ⟨hal-04186388⟩
- Dávid Javorský, Ondřej Bojar, François Yvon. Assessing Word Importance Using Models Trained for Semantic Tasks. 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023), ACL, Jul 2023, Toronto, Canada. pp.8846-8856. ⟨hal-04163044⟩
- Josep Crego, Jitao Xu, François Yvon. BiSync: A Bilingual Editor for Synchronized Monolingual Texts. The 61st Annual Meeting of the Association for Computational Linguistics, ACL, Jul 2023, Toronto, Canada. pp.369-376. ⟨hal-04163029⟩
- Ayyoob Imani, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, et al.. Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages. 61th Annual Meeting of the Association for Computational Linguistics, ACL, Jul 2023, Toronto, Canada. ⟨hal-04163023⟩
- Shu Okabe, François Yvon. Joint Word and Morpheme Segmentation with Bayesian Non-Parametric Models. 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Association for Computational Linguistics, May 2023, Dubrovnik, Croatia. pp.628-642, ⟨10.18653/v1/2023.findings-eacl.48⟩. ⟨hal-04086368⟩
- Jitao Xu, Josep Crego, François Yvon. Integrating Translation Memories into Non-Autoregressive Machine Translation. 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), May 2023, Dubrovnik, Croatia. ⟨10.18653/v1/2023.eacl-main.96⟩. ⟨hal-03995339⟩
- Rachel Bawden, François Yvon. Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM. 24th Annual Conference of the European Association for Machine Translation (EAMT 2023), Jun 2023, Tampere, Finland. ⟨10.48550/ARXIV.2303.01911⟩. ⟨hal-04015863v2⟩
- Maud Bénard, Alexandra Mestivier, Natalie Kubler, Lichao Zhu, Rachel Bawden, et al.. MaTOS: Traduction automatique pour la science ouverte. 18e Conférence en Recherche d'Information et Applications -- 16e Rencontres Jeunes Chercheurs en RI -- 30e Conférence sur le Traitement Automatique des Langues Naturelles -- 25e Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues, Jun 2023, Paris, France. pp.8-15. ⟨hal-04131594⟩
- Shu Okabe, François Yvon. Production automatique de gloses interlinéaires à travers un modèle probabiliste exploitant des alignements. CORIA-TALN 2023 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Jun 2023, Paris, France. pp.262-274. ⟨hal-04130176⟩
- Gilles Adda, Ioana Vasilescu, François Yvon. Language Report French. Georg Rehm; Andy Way. European Language Equality. A Strategic Agenda for Digital Language Equality, Springer International Publishing, pp.139-142, 2023, Cognitive Technologies, 978-3-031-28818-0. ⟨10.1007/978-3-031-28819-7_16⟩. ⟨hal-04121465⟩
- François Yvon. Transformers in Natural Language Processing. Mohamed Chetouani; Virginia Dignum; Paul Lukowicz; Carles Sierra. Human-Centered Artificial Intelligence. Advanced Lectures, 13500, Springer International Publishing, pp.81-105, 2023, Lecture Notes in Computer Science, 978-3-031-24348-6. ⟨10.1007/978-3-031-24349-3_6⟩. ⟨hal-04224531⟩
- Philippe Langlais, François Yvon. For a common European framework for evaluating AI- based translation technologies. Rachele Raus. How artificial intelligence can further European multilingualism Strategic recommendations for European decision-makers, Università di Torino - Artificial Intelligence for European Integration; Ledizioni, pp.93-96, 2023, 9791256000142. ⟨hal-04392444⟩
- François Yvon. Evaluer, diagnostiquer et analyser la traduction automatique neuronale. FORUM. Revue internationale d’interprétation et de traduction / International Journal of Interpretation and Translation , 2022, 20 (2), pp.315-332. ⟨10.1075/forum.00023.yvo⟩. ⟨hal-03975750⟩