Publications

See the whole list of publications on MT by the IXA Group: http://ixa.si.ehu.eus/node/4146/Itzulpen%20automatikoa?language=en

MAIN PUBLICATIONS

Conferences. 1 GGS

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre (2022) Principled Paraphrase Generation with Parallel Corpora. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre (2021) Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489 http://dx.doi.org/10.18653/v1/2021.acl-long.506

Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre (2020) A Call for More Rigor in Unsupervised Cross-lingual Learning Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Mikel Artetxe, Gorka Labaka, Eneko Agirre (2020) Translation Artifacts in Cross-lingual Transfer Learning Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684). https://www.aclweb.org/anthology/2020.emnlp-main.618

Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way (2020) Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908. https://www.aclweb.org/anthology/2020.acl-main.359 DOI: 10.18653/v1/2020.acl-main.359

Mikel Artetxe, Gorka Labaka, Eneko Agirre. An Effective Approach to Unsupervised Machine Translation. In ACL 2019

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, Eneko Agirre (2019) Analyzing the Limitations of Cross-lingual Word Embedding Mappings In ACL 2019

Mikel Artetxe, Gorka Labaka, Eneko Agirre. Bilingual Lexicon Induction through Unsupervised Machine Translation In ACL 2019

Journals JCR Q1
Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre (2020) Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation Knowledge-Based Systems, Volume 206 (online first). https://doi.org/10.1016/j.knosys.2020.106401

Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz (2019) Neural Machine Translation of clinical texts between long distance languages JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110

Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe (2019) Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora. Computational Linguistics. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017. Q1 en Liguistics- SSCI

Revistas JCR Q2
Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Gamallo, Pablo(2021) “Compositional distributional semantics with syntactic dependencies and selectional preferences”. Applied Sciences, 11(1), p. 5743. ISSN: 2076-3417. Q2 (JCR),
Revistas JCR Q3
Pichel, J.R, Pablo Gamallo, Iñaki Alegria, Marco Neves (2020). A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity, Journal of Quantitative Linguistics, first online 1 March. DOI: 10.1080/09296174.2020.1732177. ISSN: 0929-6174. preprint version

Gamallo, Pablo, J-R. Pichel, Iñaki Alegria (2020) Measuring Language Distance of Isolated European Languages, Information, 11(4) pp. 181. DOI: 10.3390/info11040181. ISSN: 2078-2489.

Other Journals

Pablo Gamallo, Gorka Labaka (2021) Using Dependency-Based Contextualization for transferring Passive Constructions from English to Spanish Procesamiento del Lenguaje Natural, Revista no 66, marzo de 2021, pp. 53-64 http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6322
SJR: 0,270 Computer Science Applications Q3 Language and Linguistics Q2

Pichel, J.R, Pablo Gamallo, Marco Neves, Iñaki Alegria (2020). Distância diacrónica automática entre variantes diatópicas do português e do espanhol, Linguamática, 12(1), pp. 117-126. DOI: doi.org/10.21814/lm.12.1.319. ISSN: 1647-0818.
SJR: 0,121 Language and Linguistics Q3

Uxoa Iñurrieta (2020) Identification and translation of verb+noun multiword expressions: a Spanish-Basque study Procesamiento del Lenguaje Natural, 64, pp. 123-126.
SJR: 0,270 Computer Science Applications Q3 Language and Linguistics Q2

Mikel Artetxe, Gorka Labaka, Eneko Agirre. Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text Procesamiento del Lenguaje Natural 2019
SJR: 0,270 Computer Science Applications Q3 Language and Linguistics Q2

Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria (2019) Adapting NMT to caption translation in Wikimedia Commons for low-resource languages Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40
SJR: 0,270 Computer Science Applications Q3 Language and Linguistics Q2

Nora Aranberri (2020) Can translationese features help users select an MT system for post-editing?
Revista Procesamiento del Lenguaje Natural, 64, 93-100.
SJR: 0,270 Computer Science Applications Q3 Language and Linguistics Q2

Book chapters

Cristina Cumbreño, Nora Aranberri (2021) What Do You Say? Comparison of Metrics for Post-editing Effort In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79. https://doi.org/10.1007/978-3-030-69777-8_3

(2022). Development of a Machine Translation system for promoting the use of a low resource language in the clinical domain: the case of Basque. Chapter 7 In Natural Language Processing In Healthcare
A Special Focus on Low Resource Languages. Accepted to be published on August 2022. Routledge, Taylor & Francis Group editorial. https://www.routledge.com/Natural-Language-Processing-In-Healthcare-A-Sp...

Conferences or workshops

Cross-lingual Diachronic Distance: Application to Portuguese and Spanish. In SEPLN, 2019
Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation
Nombre del congreso/conferencia/ workshop: SEPLN 2019
Tipo de comunicación: presentación oral
Autores*: José Ramom Pichel, Pablo Gamallo, Iñaki Alegria
Año: 2019

LINGUATEC: Desarrollo de recursos lingüı́sticos para avanzar en la digitalización de las lenguas de los Pirineos.
Nombre del congreso/conferencia/ workshop: SEPLN 2019
Tipo de comunicación: poster
Autores*: Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cortes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille (2019)
Año: 2019

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages.
Nombre del congreso/conferencia/ workshop: SEPLN 2019
Tipo de comunicación: presentación oral
Autores*: Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria
Año: 2019

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish
Nombre del congreso/conferencia/ workshop: SEPLN 2019. Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation
Tipo de comunicación: presentación oral
Autores*: Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka
Año: 2019

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish
Nombre del congreso/conferencia/ workshop: MT Summit - Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production
Tipo de comunicación: presentación oral
Autores*: Cristina Cumbreño, Nora Aranberri
Año: 2019

La presentación en el Workshop acabo publicandose en este libro:
Cumbreño, C., Aranberri, N. (2021). What Do You Say? Comparison of Metrics for Post-editing Effort. In: Carl, M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. https://doi.org/10.1007/978-3-030-69777-8_3
Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation
Nombre del congreso/conferencia/ workshop: Proceedings of the Fifth Conference on Machine Translation
Tipo de comunicación: presentación oral
Autores*: Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Año: 2020

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation
Nombre del congreso/conferencia/ workshop: Proceedings of the Fifth Conference on Machine Translation
Tipo de comunicación: presentación oral
Autores*: Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Año: 2020

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation
Nombre del congreso/conferencia/ workshop: Proceedings of the Fifth Conference on Machine Translation
Tipo de comunicación: presentación oral
Autores*: Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Año: 2020

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation
Nombre del congreso/conferencia/ workshop: Proceedings of the Fifth Conference on Machine Translation
Tipo de comunicación: presentación oral
Autores*: Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz
Año: 2020

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
Nombre del congreso/conferencia/workshop: ACL Student Research Workshop
Tipo de comunicación: presentación oral
Autores*: Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar
Año: 2020

With or without you? Effects of using machine translation to write flash fiction in the foreign language
Nombre del congreso/conferencia/workshop:22nd Annual Conference of the European Association for Machine Translation
Tipo de comunicación: presentación oral
Autores*: Nora Aranberri
Año: 2020

(Machine) Translation practices and quality perceptions of Basque speakers
https://icml2021.eus/programa/?lang=en
Nombre del congreso/conferencia/workshop:ICML XVIII XVIII International Conference on Minority Languages
Tipo de comunicación: presentación oral
Autores*: Nora Aranberri, Uxoa Iñurrieta
Año: 2021

PhD Theses

Nombre: Mikel Artetxe, Zurutuza
Director: Eneko Agirre eta Gorka Labaka
Título: Unsupervised Machine Translation
Organismo: UPV/EHU
FINALIZADA: 2020/07/29
- Premio a la mejor tesis europea (2020 Artificial Intelligence Dissertation Award) otorgado por EurAI (the European Association for Artificial Intelligence)
https://www.eurai.org/awards_and_grants/dissertation_award/latest
- PREMIO DE INVESTIGACIÓN SOCIEDAD CIENTÍFICA INFORMÁTICA DE ESPAÑA-FUNDACIÓN BBVA. Investigadores Jóvenes Informáticos, 2021
https://www.fbbva.es/noticias/fallados-los-premios-de-investigacion-soci...
https://www.youtube.com/watch?v=1BjRn1n65Wk
- Premio SEPLN 2021 a la Mejor Tesis Doctoral en Procesamiento del Lenguaje Natural.
http://www.sepln.org/index.php/investigacion/premio-sepln

Nombre: Uxoa Iñurrieta Urmeneta
Director: Itziar Aduriz, Gorka Labaka
Título: Aditza+izena Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala // Verb+Noun Multiword Expressions: A linguistic analysis for identification and translation
Organismo: UPV/EHU International thesis
FINALIZADA: 2019
https://www.unibertsitatea.net/blogak/ixa/2020/09/02/izenaditz-konbinazi...

Nombre: José Ramom Pichel
Director: Iñaki Alegria y Pablo Gamallo
Título: Medidas de distância entre línguas baseadas em corpus. Aplicação à linguística histórica do galego, português, espanhol e inglês
Organismo: UPV/EHU International thesis
FINALIZADA: 2020
https://www.unibertsitatea.net/blogak/ixa/2020/10/27/tesia-hizkuntzen-ar...
https://addi2.ehu.es/handle/10810/50329

Nombre: Xabier Soto
Director: Gorka Labaka y Maite Oronoz
Título: Txosten klinikoak euskararen eta gazteleraren artean itzultzen laguntzeko corpusaren bilketa eta itzultzaile automatikoaren garapena / Recopilación de corpus para facilitar la traducción de informes clínicos entre euskera y castellano y desarrollo del traductor automático
Organismo: UPV/EHU
FINALIZADA: en 2021, No durante el proyecto.
http://www.ixa.eus/node/13466?language=eu

Nombre: Aitor Ormazabal
Director: Gorka Labaka y Eneko Agirre
Título: Analyzing the Limitations of Cross-lingual Word Embedding Mappings
Organismo: UPV/EHU
FINALIZADA: No durante el proyecto.