Title: Unsupervised Machine Translation
/ Itzulpen automatiko gainbegiratu gabea
Non: Teleconference: https://eu.bbcollab.com/guest/b22b606d9ae74bc5b3e067821c897617
Faculty of informatics (UPV/EHU) Ada Lovelace room
Date: July 29, 2020, Wednesday, 11:00
Author: Mikel Artetxe Zurutuza
Supervisors: Eneko Agirre & Gorka Labaka
Languages: Basque (motivation, state of the art) and English (second half, papers, conclusions, ~11:30…)
Abstract:
The advent of neural sequence-to-sequence models has led to impressive progress in machine translation, with large improvements in standard benchmarks and the first solid claims of human parity in certain settings. Nevertheless, existing systems require strong supervision in the form of parallel corpora, typically consisting of several million sentence pairs. Such a requirement greatly departs from the way in which humans acquire language, and poses a major practical problem for the vast majority of low-resource
language pairs.
The goal of this thesis is to remove the dependency on parallel data altogether, relying on nothing but monolingual corpora to train unsupervised machine translation systems. For that purpose, our approach first aligns separately trained word representations in
different languages based on their structural similarity, and uses them to initialize either a neural or a statistical machine translation system, which is further trained through back-translation.
Mikel Artetxe publications related to his PhD work:
- Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre (2020)
A Call for More Rigor in Unsupervised Cross-lingual Learning
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics - Mikel Artetxe, Gorka Labaka, Eneko Agirre (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.
- Mikel Artetxe, Gorka Labaka, Eneko Agirre (2019)
An Effective Approach to Unsupervised Machine Translation
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203. - Mikel Artetxe, Gorka Labaka, Eneko Agirre (2018)
Unsupervised Statistical Machine Translation
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3632–3642, Brussels, Belgium, October-November. Association for Computational Linguistics. - Mikel Artetxe, Gorka Labaka, Eneko Agirre (2018)
A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics - Mikel Artetxe, Gorka Labaka, Eneko Agirre, Kyunghyun Cho (2018)
Unsupervised Neural Machine Translation
Sixth International Conference on Learning Representations (ICLR 2018) - Mikel Artetxe, Gorka Labaka, Eneko Agirre (2018)
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) pages 5012-5019.
- Mikel Artetxe, Gorka Labaka, Eneko Agirre (2017)
Learning bilingual word embeddings with (almost) no bilingual data
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics - Mikel Artetxe, Gorka Labaka, Eneko Agirre (2016)
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289–2294. Austin, Texas. ISBN: 978-1-945626-25-8
Leave a Reply