The MaTOS Project – Machine Translation for Open Science
Scientific English is the lingua franca used in many scientific fields to publish and communicate research results. However, in order for these results to be accessible to students, science journalists or decision-makers, translation must take place. The language barrier, therefore, appears to be an obstacle that limits or slows down the dissemination of scientific knowledge. Can machine translation help to overcome these challenges?
The MaTOS project – Machine Translation for Open Science – is an ANR project that aims to improve the circulation and diffusion of scientific knowledge through improved machine translation. It aims to create an experimental machine translation tool around science, a field where this technology sometimes faces difficulties.
Coordinated by François Yvon, researcher at ISIR (MLIA team) of Sorbonne University, the MaTOS project brings together three other partners: CLILLAC (Center for Inter-Language Linguistics, Lexicology, English Linguistics and Corpus Workshop) – specialists in technical and scientific translation, Inria – specialist in automatic language processing and automatic translation , and Inist (Institute for Scientific and Technical Information) – specialist in scientific documentation.
Description of the MaTOS project, by François Yvon, project coordinator.
What is the project about?
The MaTOS (Machine Translation for Open Science) project aims at developing new methods for machine translation (MT) of documents, by addressing both terminological modeling problems and problems of discourse processing and its organization in a framework of automatic text generation. Finally, it includes a component dealing with the study of evaluation methods and a large-scale experimentation on the HAL archive.
What is the objective of the project?
MaTOS aims at developing new methods for full machine translation (MT) of scientific documents, as well as automatic metrics to evaluate the quality of the produced translations. Our main application target is the translation of scientific articles between French and English, where linguistic resources can be exploited to obtain more reliable translations, both for publication support and for reading and text mining purposes. However, efforts to improve machine translation of complete documents are hampered by the inability of existing automatic metrics to detect weaknesses in the systems and to identify the best ways to remedy them. The MaTOS project proposes to address both of these challenges head on.
What are the possible applications?
This project is part of a movement to automate the processing of scientific articles. The field of machine translation is no exception to this trend, especially in the bio-medical field. The applications are numerous: text mining, bibliometric analysis, automatic detection of plagiarism and articles reporting falsified conclusions, etc. We wish to take advantage of the results of these works, but also to contribute to it in many ways:
- by developing new open resources for specialized machine translation ;
- by improving the description of textual coherence markers for scientific articles through the study of terminological variations;
- by studying new multilingual processing methods for these documents;
- by proposing metrics dedicated to measuring progress in this type of task.
The final result will allow, through improved translation, to fluidify the circulation and dissemination of scientific knowledge.
This project is part of the development of language modeling and automatic text generation methods developed in the MLIA team of ISIR, and adds a multilingual dimension to existing or past studies on the generation of summaries or texts from tabular data.
Scientific contact: François Yvon, CNRS research director