Welcome to TAdeep (MINECO-FEDER project)

Title: TAdeep: Itzulpengintza automatiko sakona
Official Code: TIN2015-70214-P
Main researcher: Kepa Sarasola
Erakundea: MINECO -FEDER
Start: 2016/01/01
End: 2018/12/31

In 2015, high quality machine translation (MT) is still a challenge. Users, whether companies or individuals, are currently aware of the benefits and limitations of these systems. Whereas companies focus on increasing productivity by combining translation memories, CAT tools and post-editing environments, regular users use MT systems extensively even when the quality does not reach the desired level.

Based on our previous work and results in the TACARDI project (MINECO-lTIN2012-38523-C02-01) and our current work on the QTLeap European project (FP7-ICT-2013.4.1-610516), we propose to investigate techniques that improve the state of the art of MT systems by focusing on two important aspects:

Deep analysis and Deep NLP. Neural networks and their application through "word embedding" and "deep-learning" have revolutionized the area of NLP in the last three years. Also, our work on adapting the Depfix and TectoMT tools for the English-Spanish and English-Basque language pairs within the QTLeap project using deep syntax and semantics, provide an exceptional test-bed for new advances in the area.
Domain-specific MT. Given their current level of output quality, appropriate domain adaptation is the best guarantee for quality improvement: technical domains, such as the IT domain explored in the QTLeap project, social networks explored in the TACARDI project, or other highly topical domains such as the medical domain or services can achieve improvements of commercial value.

The working languages of the project will be mainly English, Spanish and Basque. The first two avail of large quantities of information to exploit during research and they have high possibilities to reach the market. Basque, in turn, poses a research challenge given its rich morphology, free word order and fewer available resources, which presents an ideal set-up to explore the generalisability of the project's outcomes to other language pairs.

The IXA group at UPV/EHU has the know-how and experience required to undertake this project. The group does not only include experts in MT but also experts in morphology, syntax, semantics and machine learning.

Given the strong research capacity of the IXA group, the collaboration with Fundación Elhuyar is extremely beneficial to the project for three main aspects:

Provision of resources (corpora, lexicons...). via Web as Corpus will provide us with representative resources for the domains of the project.

Evaluation of results. A department within the Fundación Elhuyar has wide experience in evaluation and in MT post-editing.
Access to the market and prototype testing. They collaborate with the well-known Fundación Consumer in a project to adapt MT to the services domain.

Here is a list of indicators that show the current interest MT generates within R+D+i:

The "Strategic Research Agenda For Multilingual Europe 2020" by METANET
The European report of LT-Innovate 2013 "Status and Potential of the European Language Technology Markets"
The North American Association for Computational Linguistics (NAACL)

This project is closely related to two of the social challenges within the Spanish Strategy for Research, Development and Innovation, namely, "Social change and innovation" and "Economy and digital society"

KEY WORDS: ACHINE TRANSLATION, DEEP LEARNING, TECTOMT