Publications – Ixa Group. Language Technology.

IXAmBERT: Good news for languages with few resources!

Wed, 30 Sep 2020 18:10:32 +0000

Good news for languages with few resources! Pre-trained Basque monolingual and multilingual language models have proven to be very useful in NLP tasks for Basque! Even they have been created with a 500 times smaller corpus than the English one and with a 80 times smaller wikipedia.

An example of Conversational Question [...]]]> Good news for languages with few resources!
Pre-trained Basque monolingual and multilingual language models have proven to be very useful in NLP tasks for Basque!
Even they have been created with a 500 times smaller corpus than the English one and with a 80 times smaller wikipedia.

Five papers accepted at 58th annual meeting of the Association for Computational Linguistics

Tue, 05 May 2020 07:21:01 +0000

The members of the Ixa group and their collaborators will present five papers at 58th annual meeting of the Association for Computational Linguistics (ACL). ACL is one of the most important conferences on Natural Language Processing. It was to be held in July in Seattle, but this year it will be online.

Following, we present [...]]]> The members of the Ixa group and their collaborators will present five papers at 58th annual meeting of the Association for Computational Linguistics (ACL). ACL is one of the most important conferences on Natural Language Processing. It was to be held in July in Seattle, but this year it will be online.

One of the best three papers on Clinical NLP in 2017 was published by Ixa Group

Fri, 28 Jun 2019 19:51:40 +0000

A paper written by IXA members Arantza Casillas, Koldo Gojenola, Maite Oronoz and Alicia Perez, among the 3 best papers published in 2017 in the field of clinical Natural Language Processing.

The paper entitled “Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora“, by Pérez A, Weegar R, Casillas A, Gojenola K, [...]]]> A paper written by IXA members Arantza Casillas, Koldo Gojenola, Maite Oronoz and Alicia Perez, among the 3 best papers published in 2017 in the field of clinical Natural Language Processing.

Best Paper Award on CoNLL2018

Thu, 08 Nov 2018 19:42:57 +0000

Last week our colleagues Mikel Artetxe, Gorka Labaka, Iñigo Lopez-Gazpio, and Eneko Agirre were the recipients of the Best Paper Award in the 22nd Conference on Computational Natural Language Learning (CoNLL 2018) for the paper “Uncovering Divergent Linguistic Information in Word Embeddings with Lessons for Intrinsic and Extrinsic Evaluation”.

Congratulations!

Abstract:

Following the recent success of word embeddings, it has been argued that there is no such thing as an ideal representation for words, as different models tend to capture divergent and often mutually incompatible aspects like semantics/syntax and similarity/relatedness. In this paper, we show that each embedding model captures more information than directly apparent. A linear transformation that adjusts the similarity order of the model without any external resource can tailor it to achieve better results in those aspects, providing a new perspective on how embeddings encode divergent linguistic information. In addition, we explore the relation between intrinsic and extrinsic evaluation, as the effect of our transformations in downstream tasks is higher for unsupervised systems than for supervised ones.

UncoVec:
This is an open source implementation in GitHub of our word embedding post-processing and evaluation framework, described in the paper.

]]>

Science journal: ‘Ixa opens a new research avenue: Machine Translation without a dictionary?’

Wed, 29 Nov 2017 21:03:14 +0000

Science reported this week about the work recently published by our colleagues Mikel Artetxe, Eneko Agirre and Gorka Labaka: Artificial intelligence goes bilingual—without a dictionary

In October the 30th our three colleagues published a pre-print paper entitled Unsupervised Neural Machine Translation in collaboration with Kyunghyun Cho.

One day later G. Lample published another paper with similar contents entitled Unsupervised Machine Translation Using Monolingual Corpora Only. Both papers are under consideration at ICLR 2018.

Those are some sentences written by Matthew Hutson a freelance writer covering technology for Science:

[…] two new papers show that neural networks can learn to translate with no parallel texts—a surprising advance that could make documents in many languages more accessible.

[…] Imagine that you give one person lots of Chinese books and lots of Arabic books—none of them overlapping—and the person has to learn to translate Chinese to Arabic. That seems impossible, right?” says the first author of one study, Mikel Artetxe, a computer scientist at the University of the Basque Country (UPV) in San Sebastián, Spain. “But we show that a computer can do that.”

[…] “This is in infancy,” Artetxe’s co-author Eneko Agirre cautions. “We just opened a new research avenue, so we don’t know where it’s heading.”

[…] Artetxe says the fact that his method and Lample’s—uploaded to arXiv within a day of each other—are so similar is surprising. “But at the same time, it’s great. It means the approach is really in the right direction.”

Congratulations Mikel, Eneko, Gorka and Kyunghyun!

]]>

Best paper award in SEPLN2017

Tue, 26 Sep 2017 11:34:35 +0000

Last week, our colleagues Begoña Altuna, María Jesús Aranzabe, and Arantza Diaz de Ilarraza were awarded in Murcia with the best paper award in the 33rd INTERNATIONAL CONFERENCE OF THE SPANISH SOCIETY FOR NATURAL LANGUAGE PROCESSING (SEPLN 2017)

CONGRATULATIONS!

The paper is available here: EusHeidelTime: Time Expression Extraction and Normalisation for Basque

Temporal information helps to organise the information in texts as it places the actions and states in time. It is therefore very important to identify the time points and intervals in the text, as well as what times they refer to. We developed EusHeidelTime for Basque time expression extraction and normalisation. For it, we analysed time expressions in Basque, we created the rules and resources for the tool and we built corpora for development and testing. We finally ran an experiment to evaluate EusHeidelTime’s performance. We achieved satisfactory results and we proved the adaptability of the tool for morphologically rich languages.

]]>

Our papers in Japan (COLING 2016)

Mon, 12 Dec 2016 14:12:18 +0000

Those are our six papers in COLING 2016, taking place in Osaka, Japan, on Dec 11 2016.

Machine Learning for Metrical Analysis of English Poetry
Manex Agirrezabal, Inaki Alegria and Mans Hulden
Using Linguistic Data for English and Spanish Verb-Noun Combination Identification
Uxoa Iñurrieta, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola, Itziar Aduriz and John Carroll
Improving Translation Selection with Supersenses
Haiqing Tang, Deyi Xiong, Oier Lopez de Lacalle and Eneko Agirre
The impact of simple feature engineering in multilingual medical NER
Rebecka Weegar, Arantza Casillas, Arantza Diaz de Ilarraza, Maite Oronoz, Alicia Pérez and Koldo Gojenola
Clinical Natural Language Processing Workshop
A Preliminary Study of Statistically Predictive Syntactic Complexity Features and Manual Simplifications in Basque
Itziar Gonzalez-Dios, María Jesús Aranzabe and Arantza Díaz de Ilarraza
Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Comparing two Basic Methods for Discriminating Between Similar Languages and Varieties

Pablo Gamallo, José Ramom Pichel, Iñnaki Alegria and Manex Agirrezabal

Third Workshop on NLP for Similar Languages, Varieties and Dialects

]]>