Machine Translation – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Tue, 28 Jul 2020 14:05:43 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 PhD Thesis: Unsupervised Machine Translation (Mikel Artetxe, 2020/07/29) https://www.ehu.eus/ehusfera/ixa/2020/07/28/phd-thesis-unsupervised-machine-translation-mikel-artetxe-2020-07-29/ https://www.ehu.eus/ehusfera/ixa/2020/07/28/phd-thesis-unsupervised-machine-translation-mikel-artetxe-2020-07-29/#respond Tue, 28 Jul 2020 13:37:29 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2744 Title: Unsupervised Machine Translation / Itzulpen automatiko gainbegiratu gabea

Non: Teleconference: https://eu.bbcollab.com/guest/b22b606d9ae74bc5b3e067821c897617 Faculty of informatics (UPV/EHU) Ada Lovelace room Date: July 29, 2020, Wednesday, 11:00 Author: Mikel Artetxe Zurutuza Supervisors: Eneko Agirre & Gorka Labaka Languages: Basque (motivation, state of the art) and English (second half, papers, conclusions, ~11:30…)

https://github.com/artetxem

https://github.com/artetxem

[...]]]>
Title:  Unsupervised Machine Translation
           / Itzulpen automatiko gainbegiratu gabea

Non: Teleconference: https://eu.bbcollab.com/guest/b22b606d9ae74bc5b3e067821c897617
Faculty of informatics (UPV/EHU) Ada Lovelace room
Date: July 29, 2020, Wednesday,  11:00
Author: Mikel Artetxe Zurutuza 
Supervisors: Eneko Agirre & Gorka Labaka
Languages:  Basque (motivation, state of the art)  and English (second half, papers, conclusions, ~11:30…)

Abstract:

The advent of neural sequence-to-sequence models has led to impressive progress in machine translation, with large improvements in standard benchmarks and the first solid claims of human parity in certain settings. Nevertheless, existing systems require strong supervision in the form of parallel corpora, typically consisting of several million sentence pairs. Such a requirement greatly departs from the way in which humans acquire language, and poses a major practical problem for the vast majority of low-resource
language pairs.

The goal of this thesis is to remove the dependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervised machine translation systems. For that purpose, our approach first aligns separately trained word representations in
different languages based on their structural similarity, and uses them to initialize either a neural or a statistical machine translation system, which is further trained through back-translation.

Mikel Artetxe publications related to his PhD work:

]]>
https://www.ehu.eus/ehusfera/ixa/2020/07/28/phd-thesis-unsupervised-machine-translation-mikel-artetxe-2020-07-29/feed/ 0
QTLeap european project: Meeting in Donostia-San Sebastian (June/30 – July/1) https://www.ehu.eus/ehusfera/ixa/2015/06/29/qtleap-european-project-meeting-in-donostia-san-sebastian-june30-july1/ https://www.ehu.eus/ehusfera/ixa/2015/06/29/qtleap-european-project-meeting-in-donostia-san-sebastian-june30-july1/#comments Mon, 29 Jun 2015 06:53:19 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2216 IXA Group is organizing in Donostia-San Sebastián a meeting of the European project QTLeap from Monday June 29th to Wednesday July 1st.

Recently, at the beginging of June, this project succesfully organized in Denver, Colorado, the SSST-9 – Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation collocated with NAACL 2015, June 4, 2015.

[...]]]>
IXA Group is organizing in Donostia-San Sebastián a meeting of the European project QTLeap from Monday June 29th to Wednesday July 1st.

Recently, at the beginging of June, this project succesfully organized in Denver, Colorado, the SSST-9 – Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation collocated with NAACL 2015, June 4, 2015.

The QTLeap project (Quality Translation by Deep Language Engineering Approaches) investigates and develops an innovative methodology for
Machine Translation that explores new solutions, using deep language engineering approaches to achieve higher quality translations. The project is run by an European consortium with other seven partners: Bulgarian Academy of Sciences, Charles University in Prague, German Research Center for Artificial Intelligence, Higher Functions Lda., Humboldt University in Berlin, University of the Basque Country, University of Groningen and University of Lisbon. For more information and contact details please visit: qtleap.eu.
]]>
https://www.ehu.eus/ehusfera/ixa/2015/06/29/qtleap-european-project-meeting-in-donostia-san-sebastian-june30-july1/feed/ 1
Talk: Discourse Structure in Machine Translation Evaluation (L. Marquez, 2015/06/25) https://www.ehu.eus/ehusfera/ixa/2015/06/23/talk-discourse-structure-in-machine-translation-evaluation-l-marquez-20150615/ https://www.ehu.eus/ehusfera/ixa/2015/06/23/talk-discourse-structure-in-machine-translation-evaluation-l-marquez-20150615/#respond Tue, 23 Jun 2015 14:19:03 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2194 Speaker: Lluis Màrquez …………….Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI) Data: June 25th 2015, Thursday Time: 12:00 Room: 3.2 room. Faculty of Informatics (UPV/EHU) Title: “Discourse Structure in Machine Translation Evaluation”

Abstract:

In this talk I will describe our research at the Arabic Language Technologies group from the [...]]]> Speaker: Lluis Màrquez
…………….Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI)
Data: June 25th 2015, Thursday
Time: 12:00
Room: 3.2 room. Faculty of Informatics (UPV/EHU)
Title:Discourse Structure in Machine Translation Evaluation

Abstract:

In this talk I will describe our research at the Arabic Language Technologies group from the Qatar Computing Research Institute on applying discourse-level information to automatic machine translation (MT) evaluation.
I will start by describing some variants of a discourse-aware similarity measure, which uses the `all-subtree’ convolution kernel to compare discourse parse trees in accordance with the Rhetorical Structure Theory. Then, I will show that these measures help improve a number of already existing MT evaluation metrics both at the segment and at the system level by increasing the correlation with human judgements. This indicates that discourse information is complementary to the state-of-the-art metrics, and thus could be taken into account in the development of richer evaluation measures.
In a second part I will present a strong and robust evaluation measure combining the discourse-based similarity with other metrics from the Asiya MT evaluation toolkit, and tuning the weights of the combination on actual human judgments. Experiments on the WMT12, WMT13, and WMT14 metrics shared task datasets show correlation with human judgments that outperforms those of the state-of-the-art, both at the segment and at the system level with very consistent results across language pairs.
In the final part of the talk, I will introduce two preliminary attempts of learning metrics from finer-grained features for pairwise quality comparison. In the first one, we use preference reranking with kernels to learn from tree structured representation. In the second one, we use a Neural Network architecture to learn from a distributed representation of syntax and semantics. Both frameworks are developed with the spirit of being general and extensible from MT evaluation to quality estimation and machine translation.

Short bio:

Principal Scientist at the Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI) since 2013. Previously, Associate Professor at the Technical University of Catalonia (UPC, 2000-2013). He holds a PhD. in Computer Science from UPC (1999). His research focuses on Machine Learning methods for Natural Language structure prediction problems, including syntactic and semantic parsing. He works on applications in statistical machine translation and its evaluation, and question answering in community forums. He has 120+ papers in Natural Language Processing and Machine Learning journals and conferences. He has been General and Program Co-chair of major conferences in the area (EMNLP, EACL, CoNLL, EAMT, etc.), and held several organizational roles in ACL and EMNLP too. He was co-organizer of various international evaluation tasks at Senseval/SemEval (2004, 2007, 2010, 2015) and CoNLL shared tasks (2004-2005, 2008-2009). Secretary and President of the ACL SIG on Natural Language Learning (SIGNLL) in the period 2007-2011, he currently serves as President of the European Chapter of the ACL (2015-2017). He has been Guest Editor of special issues at Computational Linguistics, LRE, JNLE, and JAIR in the period (2007-2015). He has participated in 18 national and EU research projects, acting as the principal site researcher in 10 of them.

]]> https://www.ehu.eus/ehusfera/ixa/2015/06/23/talk-discourse-structure-in-machine-translation-evaluation-l-marquez-20150615/feed/ 0
10.000 downloads for Mitzuli translator app https://www.ehu.eus/ehusfera/ixa/2015/06/18/10-000-downloads-for-mitzuli-translator-app/ https://www.ehu.eus/ehusfera/ixa/2015/06/18/10-000-downloads-for-mitzuli-translator-app/#comments Thu, 18 Jun 2015 19:10:23 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2186 Do you have Mitzuli app on your Android phone? This app allows you to translate text, audio and images between 50 language pairs, it’s free and… it was created by Mikel Artetxe, a member of IXA Group and student in our HAP-LAP master’s programme!

And now it has more than 10.000 downloads, in one month! [...]]]> Do you have Mitzuli app on your Android phone?
This app allows you to translate text, audio and images between 50 language pairs, it’s free and… it was created by Mikel Artetxe, a member of IXA Group and  student in our HAP-LAP master’s programme!

And now it has more than 10.000 downloads, in one month!
Thanks and congratulations, Mikel!

 

Some news:

]]> https://www.ehu.eus/ehusfera/ixa/2015/06/18/10-000-downloads-for-mitzuli-translator-app/feed/ 1
Ixa Group is a new institutional member of EAMT https://www.ehu.eus/ehusfera/ixa/2013/01/09/ixa-eamt/ https://www.ehu.eus/ehusfera/ixa/2013/01/09/ixa-eamt/#comments Wed, 09 Jan 2013 15:12:24 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1346 In 2012 Ixa Taldea became the 11th institutional member in the European Association of Machine Translation (EAMT), the organization that serves the growing community of people interested in MT and translation tools, including users, developers, and researchers of this increasingly viable technology.

The EAMT is one of three regional associations of the International Association for [...]]]> In 2012 Ixa Taldea became the 11th institutional member in the European Association of Machine Translation  (EAMT), the organization that serves the growing community of people interested in MT and translation tools, including users, developers, and researchers of this increasingly viable technology.

The EAMT is one of three regional associations of the International Association for Machine Translation (IAMT). Its sister organizations are the Association for Machine Translation in the Americas (AMTA) and the Asia-Pacific Association for Machine Translation (AAMT).

Among other activities, the EAMT organizes the bi-annual MT Summit and the annual EAMT conferences, maintains the MT-List mailing list, and  compiles listings of companies and products which are distributed free or at nominal cost to its members (Compendium of Translation Software)

Ixa_EAMT_corporate-member

The current 11 corporate and institutional members are the following:

]]> https://www.ehu.eus/ehusfera/ixa/2013/01/09/ixa-eamt/feed/ 1
BerbaTek project’s results and demos. https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/ https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/#respond Fri, 10 Feb 2012 16:36:38 +0000 http://www.ehu.eus/ehusfera/ixa/?p=953 BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, [...]]]> BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, the technology centre Vicomtech and the foundation Tecnalia Research & Innovation.

 

Yesterday, the results of this project were presented in a press conference with representatives of the Basque Government. Throughout the BerbaTek project we have created several language tools, resources and some demos to show the potential of the integration of language, voice and multimedia technologies, when it comes to creating applications for the areas that make up the languages industry, in other words, for translation, contents and teaching. Three demos were presented in the press conference:

  • Automatic dubbing demo. The automatic dubbing of films is a difficult challenge for the moment (different voices, colloquial language, different speeds), but for some types of documentaries (single speaker, voice-over, coordination of the lips not necessary or unimportant ) we’ve done a demo that performs satisfactorily. Given a documentary in Spanish and its transcription (which can be obtained automatically by means of any of the dictation programs for Spanish in the market), VicomtechIK4’s temporal alignment technology creates a subtitles file, a transcription with time marks for the beginning and end of each sentence. Then, the Matxin MT system, developed by the IXA group, automatically translates the subtitles into Basque, and Aholab’s text-to-speech technology obtains the synchronized voice. We have successfully applied this demo to the single-speaker sections of the television program Teknopolis produced by Elhuyar. This demo can be seen at work here.
  • Semantic multimedia search engine for science and technology content. This search engine is based on WNTerm, an ontology specialized in science and technology wich was created by Elhuyar and IXA. It is a network where scientific and technological terms are semantically related to each other, with subclasses, synonyms, etc. A new augmented version will be presented next month.
  • Personal teacher for language learning. For the field of education, we have created a demo of a personal tutor for language learning. The tutor is a 3D avatar developed by Vicomtech-IK4 that shows emotions, can speak Basque and can understand what is said in Basque, using Aholab‘s technology. The tutor assists us in various tasks: we can do grammar exercises (verb conjugation, word inflection) and reading comprehension exercises (fill in gaps in a text, choosing from several options) that are created automatically from texts using technology from IXA; we can evaluate our pronunciation, with Aholab technology; or it helps us when writing texts, with inflection of words, writing of numbers or querying dictionaries, by means of technology from IXA and Elhuyar. By the moment this demo works in local mode, but it will beavailable online by next spring.

The pieces of news has been received by media today:

Further information about this project can be found at Berbatek project’s website.

]]>
https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/feed/ 0
Talk. Tegau Andrews. An overview of Welsh language technologies. (2011/11/02) https://www.ehu.eus/ehusfera/ixa/2011/10/28/794/ https://www.ehu.eus/ehusfera/ixa/2011/10/28/794/#comments Fri, 28 Oct 2011 12:52:23 +0000 http://www.ehu.eus/ehusfera/ixa/?p=794 Ixa Group has often collaborated with Bangor University in the development of language technology for less resourced languages. Mainly for Basque and Welsh and in the frame of SALTMIL. Briony Williams, Delith Prys and Gruff Prys are our Welsh contacts. Tegau Andrews from Bangor University will be with us next week, and we have programmed [...]]]> Ixa Group has often collaborated with Bangor University in the development of language technology for less resourced languages. Mainly for Basque and Welsh and in the frame of SALTMIL.
Briony Williams, Delith Prys and Gruff Prys are our Welsh contacts.
Tegau Andrews from Bangor University will be with us next week, and we have programmed this talk:

Speaker: Tegau Andrews (Bangor University, Wales)
Uned Technolegau Iaith  /  Language Technologies Unit
Prifysgol Bangor     /   Bangor University

When: November 2, Wednesday
Where: Room 3.2
Time: 15.00
Title: From terminology standardization systems to machine translation: An overview of Welsh language technologies

Abstract:

An endangered language will progress if its speakers can make use of electronic technology” so postulates Wales-based linguistics professor David Crystal (Language Death, 2000: 141). Welsh, spoken by 20.8% of the population of Wales (Census 2001), is classed a vulnerable language by UNESCO, yet it is the Welsh Government’s stated aim to make Wales a truly bilingual nation.

This talk will focus on the progress being made in developing language technologies for Welsh speakers. It will range over topics such as Welsh machine translation, computer-aided translation tools, text-to-speech technology, terminology portals and e-learning resources, and present an overview of the work being done at the Terminology and Language Technologies Unit at Bangor University. The aim of such work is to enable and encourage Welsh speakers to use electronic technology in their own language.

]]>
https://www.ehu.eus/ehusfera/ixa/2011/10/28/794/feed/ 1
Talk. Daniele Pighin. Semantic Structures in Translation Ranking (2011/05/31) https://www.ehu.eus/ehusfera/ixa/2011/05/27/daniele-pighin/ https://www.ehu.eus/ehusfera/ixa/2011/05/27/daniele-pighin/#respond Fri, 27 May 2011 11:22:05 +0000 http://www.ehu.eus/ehusfera/ixa/?p=677
Speaker: Daniele Pighin
          NLPRG, TALP
          Technical University of Catalonia, UPC
Date: May 31, 2011
Time: 11:30
Where: Computer Science Faculty, Room 3.2

Title
   Automatic Projection of Semantic Structures:
      an Application to Pairwise Translation Ranking
 
Abstract
The ability to automatically assess the quality of translation
hypotheses is a key requirement towards the development of accurate and
dependable translation models. While it is largely agreed that proper
transfer of predicate-argument structures from source to target is a
very strong indicator of translation quality, especially in relation to
adequacy, the incorporation of this kind of information in the
Statistical Machine Translation (SMT) evaluation pipeline is still
limited to few and isolated cases.

We present a model for the inclusion of semantic role annotations in the
framework of confidence estimation for machine translation. The model
has several interesting properties:
   1) it only requires a linguistic processor on the (generally
well-formed) source side of the translation;
   2) it does not directly rely on properties of the translation model
(hence, it can be applied beyond phrase-based systems);
   3) it is inherently extendable to cope with different kinds of
sequential annotations, e.g., POS tags.
These features make it potentially appealing for system ranking,
translation re-ranking and user feedback evaluation. Preliminary
experiments in pairwise hypothesis ranking on five confidence estimation
benchmarks show that the model has the potential to capture salient
aspects of translation quality.
]]> https://www.ehu.eus/ehusfera/ixa/2011/05/27/daniele-pighin/feed/ 0 International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011) https://www.ehu.eus/ehusfera/ixa/2011/05/11/lihmt-2011/ https://www.ehu.eus/ehusfera/ixa/2011/05/11/lihmt-2011/#comments Wed, 11 May 2011 09:21:45 +0000 http://www.ehu.eus/ehusfera/ixa/?p=665 Ixa Group in collaboration with TALP Centre from Technical University of Catalonia is organizing a one-day workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT-2011).

The workshop will be held in Barcelona on Friday, November 18, 2011.

Paper submission deadline is September 9 2011. See: http://ixa2.si.ehu.es/lihmt2011/

This is part of the dissemination effort of [...]]]> Ixa Group in collaboration with TALP Centre from Technical University of Catalonia is organizing a one-day workshop on
Using Linguistic Information for Hybrid Machine Translation
(LIHMT-2011).

The workshop will be held in Barcelona on Friday, November 18, 2011.

Paper submission deadline is September 9 2011.  See: http://ixa2.si.ehu.es/lihmt2011/

This is part of the dissemination effort of our OpenMT-2 project.

]]> https://www.ehu.eus/ehusfera/ixa/2011/05/11/lihmt-2011/feed/ 1
Talk. Lluís Màrquez. Automatic evaluation in Machine Translation: Towards combined linguistically-motivated measures (2011/05/10) https://www.ehu.eus/ehusfera/ixa/2011/05/09/marquez/ https://www.ehu.eus/ehusfera/ixa/2011/05/09/marquez/#comments Mon, 09 May 2011 10:10:00 +0000 http://www.ehu.eus/ehusfera/ixa/?p=587 Speaker: Lluís Màrquez NLPRG, TALP Technical University of Catalonia, UPC

Date: May 10, 2011 Time: 15:30 Where: Computer Science Faculty, Room 3.2

Automatic evaluation in Machine Translation: Towards combined linguistically-motivated measures

Automatic evaluation plays a very important role in the development and comparison of machine translation systems. In this talk we [...]]]>

Speaker: Lluís Màrquez
 NLPRG, TALP
Technical University of Catalonia, UPC

Date: May 10, 2011
Time: 15:30
Where
: Computer Science Faculty, Room 3.2

Automatic evaluation in Machine Translation:
Towards combined linguistically-motivated measures

Automatic evaluation plays a very important role in the development and comparison of machine translation systems. In this talk we will overview the current trend of using linguistically-guided evaluation measures based on several linguistic layers and their combination. Also, we will talk about confidence estimation measures, a particular subset of measures to assess output quality without the need of reference translations. Finally, we will overview the role of evaluation measures within the FAUST European project (Feedback Analysis for User Adaptive Statistical Translation; http://www.faust-fp7.eu/),
focusing on the usage of user feedback to guide the combination of measures.

]]> https://www.ehu.eus/ehusfera/ixa/2011/05/09/marquez/feed/ 1