The members of the Ixa group and their collaborators will present five papers at 58th annual meeting of the Association for Computational Linguistics (ACL). ACL is one of the most important conferences on Natural Language Processing. It was to be held in July in Seattle, but this year it will be online.
Following, we present the accepted papers:
– Selecting Backtranslated Data from Multiple Sources for improved Neural Machine Translation (Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way): We analyse the impact that data backtranslated with diverse systems has on eu-es and de-en clinical domain NMT, and employ data selection (DS) to optimise the synthetic corpus. We further rescore the output of DS by considering the quality of the MT systems used for backtranslation and lexical diversity of the resulting corpora.
– On the Cross-lingual Transferability of Monolingual Representations (Mikel Artetxe, Sebastian Ruder, Dani Yogatama): We challenge common beliefs of why multilingual BERT works by showing that a monolingual BERT model can also be transferred to new languages at the lexical level.
– A Call for More Rigor in Unsupervised Cross-lingual Learning (Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre): In this position paper, we review motivations, definition, approaches and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them.
– DoQA – Accessing Domain-Specific FAQs via Conversational QA (Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre): We present DoQA, a dataset for accessing FAQs via conversational Question Answering, showing that it is possible to build high quality conversational QA systems for accessing FAQs without in-domain training data.
– A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation (Jan Deriu, Katsiaryna Mlynchyk, Philippe Schläpfer, Alvaro Rodrigo, Dirk von Grünigen, Nicolas Kaiser, Kurt Stockinger, Eneko Agirre, Mark Cieliebak): We introduce a novel methodology to efficiently construct a corpus for question answering over structured data, with threefold manual annotation speed gains compared to previous schemes such as Spider. Our method also produces fine-grained alignment of query tokens to parsing operations. We train a state-of-the-art semantic parsing model on our data and show that our corpus is a challenging dataset and that the token alignment can be leveraged to increase the performance significantly.
Congratulations to all the authors!
Leave a Reply