Language Resources – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Wed, 03 Dec 2014 15:15:38 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 The Basque WordNet semantic dictionary is a “public resource” now https://www.ehu.eus/ehusfera/ixa/2014/06/13/the-basque-wordnet-semantic-dictionary-is-a-public-resource-now/ https://www.ehu.eus/ehusfera/ixa/2014/06/13/the-basque-wordnet-semantic-dictionary-is-a-public-resource-now/#comments Fri, 13 Jun 2014 13:13:04 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2039 Machines need computing tools that are more powerful than conventional dictionaries for tasks like information extraction, disambiguation of word meanings, etc. This is in fact the function of the Euskal WordNet application —developed by the IXA Group (UPV/EHU)— which can already be consulted and downloaded free of charge.

This is the first Lexical Knowledge Base [...]]]> Machines need computing tools that are more powerful than conventional dictionaries for tasks like information extraction, disambiguation of word meanings, etc. This is in fact the function of the Euskal WordNet application —developed by the IXA Group (UPV/EHU)— which can already be consulted and downloaded free of charge.

This is the first Lexical Knowledge Base (LKB) developed for the Basque language: a “semantic dictionary” or “store” that compiles and organises lexical and semantic information. “It’s like a database, but the difference is that it not only gathers the usual information of a dictionary —the meanings of words and their corresponding definitions and examples—, it also links the concepts with each other,” pointed out Eneko Agirre, an IXA Group computer programmer.

If we look up the entry hatz (“finger”, “digit” or “toe” in Basque), the result is as follows: “Each of the five appendages at the end of human hands and feet.” That is what the term means. But apart from this information, we can get much more: the finger/toe is an appendage of the body; the thumb is a finger; fingers are part of the hand; hands, in turn, are part of the arm and fingers are used to touch objects, etc. In short: all the concepts are interrelated hierarchically. Every concept is also related to its equivalents in other languages: digit, hatz, dedo, dixito and dit.

EuskalWordnet_hatz_eleanitza

Consulting the word hatz in Basque WordNet.

This database is tremendously useful in various fields, like machine translation, information extraction, disambiguation of word meanings and for question-answer systems. In machine translation, for example, the system has to understand which word it is translating, a task for which it needs a “semantic dictionary” of this type. “For a quality translation, it is necessary to be able to distinguish the most appropriate meaning from among the various ones,” stressed Agirre.

“Our aim (within the framework of QTLeap European project) is to improve the quality of machine translations by using WordNet,” he pointed out.

Over the 2014-2015 academic year, the university Master’s degree in Language Analysis and Processing (LAP) that the IXA Group will be running at the UPV/EHU will be studying the Basque WordNet and other language technologies used to develop similar applications.

Master’s in Language Analysis and Processing (LAP)

The aim of the University Master’s in Language Analysis and Processing is to analyse language and to learn about the techniques and applications available for processing it with the help of the computer.

This Master’s has been organised by the UPV/EHU’s IXA Group and is geared towards anybody who combines linguistics and computing: philologists and linguistics experts, computing and telecommunications engineers, mathematicians, translators, etc. To apply for it, it is enough to be in possession of a University degree, have some experience and display some interest in the subject.

The Master’s will take one year and a half and the classes will be held at the Computing Faculty of the UPV/EHU-University of the Basque Country. It will be possible to spread it over two or three academic years (to cater for professionals who are working).

The pre-registration period is already open, and applications will be accepted until June 30. For further information on the Master’s, please check out http://ixa.si.ehu.es/master/.

]]> https://www.ehu.eus/ehusfera/ixa/2014/06/13/the-basque-wordnet-semantic-dictionary-is-a-public-resource-now/feed/ 1
Seminar: ‘The Lexikoaren Behatokia project’ & ‘Enriching EDBL with Hiztegi Batua’ (12/11/2013) https://www.ehu.eus/ehusfera/ixa/2013/12/13/seminar-the-lexikoaren-behatokia-project-plus-enriching-edbl-with-hiztegi-batua-12112013/ https://www.ehu.eus/ehusfera/ixa/2013/12/13/seminar-the-lexikoaren-behatokia-project-plus-enriching-edbl-with-hiztegi-batua-12112013/#comments Fri, 13 Dec 2013 09:01:11 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1866 Topics : The Lexikoaren Behatokia project (X. Artola) + Enriching EDBL with Hiztegi Batua (Gorka Labaka – Xabier Artola) Speakers: Xabier Artola and Gorka Labaka Day: December 11th 2013, Wednesday

The Basque language academy, Euskaltzaindia, launched in 2008 the Lexikoaren Behatokia project (“The Lexicon Observatory”), led by Andoni Sagarna. The objective of the project was [...]]]> Topics : The Lexikoaren Behatokia project (X. Artola) + Enriching EDBL with Hiztegi Batua (Gorka Labaka – Xabier Artola)
Speakers: Xabier Artola and Gorka Labaka
Day: December 11th 2013, Wednesday

The Basque language academy, Euskaltzaindia, launched in 2008 the Lexikoaren Behatokia project (“The Lexicon Observatory”), led by Andoni Sagarna. The objective of the project was to create a labelled and linguistically annotated corpus for research. In order to carry out the corpus, it was expected to use a variety of sources, mostly media, especially general interest ones. In late 2012, the corpus consisted of 26,565,924 words, and has been expanded year after year. Euskaltzaindia, the IXA research group, the Elhuyar foundation and UZEI collaborate on the project.

The Lexikoaren Behatokia corpus is available here.

On the other hand, the latest version of the prescriptive dictionary Hiztegi Batua has provided new entries for the Basque lexical database EDBL. This enriching process was explained in the seminar.

]]> https://www.ehu.eus/ehusfera/ixa/2013/12/13/seminar-the-lexikoaren-behatokia-project-plus-enriching-edbl-with-hiztegi-batua-12112013/feed/ 1
Arbel digitala: a tool for writing verses in Basque is on line https://www.ehu.eus/ehusfera/ixa/2013/02/18/bertso-arbel-digitala/ https://www.ehu.eus/ehusfera/ixa/2013/02/18/bertso-arbel-digitala/#comments Mon, 18 Feb 2013 12:07:31 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1457

Three members of the IXA Group (Manex Agirrezabal, Bertol Arrieta and Iñaki Alegria), in collaboration with the Association of Friends of Bertsolaritza (AFB, Bertsozale Elkartea) have developed a new product named Arbel digitala to train verse-makers, including language technology tools and verse-making. This new product was presented last January in the Koldo Mitxelena Library [...]]]> Arbel_digitala

Three members of the IXA Group (Manex Agirrezabal, Bertol Arrieta and Iñaki Alegria), in collaboration with the Association of Friends of Bertsolaritza (AFB, Bertsozale Elkartea) have developed a new product named Arbel digitala to train verse-makers, including language technology tools and verse-making. This new product was presented last January in the Koldo Mitxelena Library by Manex, Bertol, and the members of the AFB Aritz Zerain and Ixiar Eizagirre.

The tool has different capabilities:

  • different stanzas and melodies accessible from a database,
  • rhyme and synonym search engine,
  • syllable counter…

This application is more powerful than Bertsolarixa, a previous system created some years ago.  These capabilities were also used some months ago when they created a robot verse-maker.

If you want to know more about the Arbel digitala tool, you can go directly with this link. Try it, and maybe, you’ll write an incredible verse with this artificial inspiration!

This new in several media: Berria, bertso-eskolak.com, Diario Vasco, Hamaika TV

]]> https://www.ehu.eus/ehusfera/ixa/2013/02/18/bertso-arbel-digitala/feed/ 1
Seminar. First steps towards Quechua’s processing. (2012/11/15) https://www.ehu.eus/ehusfera/ixa/2012/11/12/quechua/ https://www.ehu.eus/ehusfera/ixa/2012/11/12/quechua/#comments Mon, 12 Nov 2012 20:00:50 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1314 Hugo and Richard visiting Aholab in Bilbao.

Speakers: Hugo Quispe and Richard Castro (Universidad UNSAAC of Cusco, Peru), ………………Olatz Arregi, Xabier Artola eta Kepa Sarasola (Ixa Group) Title: Primera aproximación al procesamiento automático del Quechua ………(First steps towards Quechua’s processing.) Date: November 15, 2012, Thursday Time: 16:00–17:00 Where: Computer Science Faculty, Room 3.2

[...]]]>

Hugo and Richard visiting Aholab in Bilbao.

Speakers: Hugo Quispe and  Richard Castro (Universidad UNSAAC of Cusco, Peru),
………………Olatz Arregi,  Xabier Artola eta  Kepa Sarasola (Ixa Group)
Title: Primera aproximación al procesamiento automático del Quechua
………(First steps towards Quechua’s processing.)
Date: November 15, 2012, Thursday
Time: 16:0017:00
Where
: Computer Science Faculty, Room 3.2

Abstract

El Quechua (Runa Simi) como lengua oriunda de la cultura Inca en el Perú, es una familia de lenguas en Latinoamérica. La situación actual de la lengua, por factores como la occidentalización entre otros, ha hecho que el quechua sea una lengua vulnerable, en vías de extinción.

Un grupo de profesores e investigadores del grupo IXA de la UPV/EHU, en conjunto con la UNSAAC en Cusco, Perú, estamos realizando un trabajo para sentar las bases de lo que pretende ser el centro de ingeniería lingüística
de Cusco. Se trata de desarrollar los primeros recursos básicos y herramientas para al procesamiento automático del quechua. Los temas en los que estamos trabajando son: recopilación de un corpus textual, una base de datos léxica para la lengua quechua (BDLQ) y futuras herramientas derivadas de la misma, uso de la herramienta FOMA en el análisis morfológico y creación de un TTS como herramientas básicas para el tratamiento de la lengua.

De esta manera, se ha consolidado las bases de apoyo y trabajo en equipo entre las dos universidades, en bien de una lengua en situación crítica.

Hugo and Richard visiting Ixa Group in Donostia.

Quechua (Runa Simi) is a native South American language family and dialect cluster spoken primarily in the Andes of South America. It is the most widely spoken language family of the indigenous peoples of the Americas, with a total of probably some 8 to 10 million speaker. Like Basque Quechua remains alive but in last centuries suffered continuous regression. The region in which Quechua is spoken is becaming smaller and smaller. Similar with what happened with Basque, Quechua was not an official language, it has been out of educational systems, out of media, and out of industrial environments. Today Quechua holds co-official language status in Peru and Bolivia, even it is not regulated. But, although there have been several changes in the last years, Quechua is still associated with lack of education, stigmatized as uneducated, rural, or holding low economic and power resources, as it was Basque some years ago. Language technology may help to the Quechua speakers’ community and to scholars to built a standard. So opening a door to face Quechua’s future in the digital world. Corpus tools, lexical data-bases and spelling checkers have proven to be useful tools in that way for other languages such as Basque.
The group created by Prof. Juan Cruz in UNSAAC University in Cusco (Peru) has been collaborating with Ixa Group and Aholab since the beginning of  2012. Hugo Quispe and  Richard Castro will present in this seminar the work they are doing on the definition of a lexical data-base and a TTS system (Text to Speech) for Quechua.

The group of Cusco (January 2012)

]]> https://www.ehu.eus/ehusfera/ixa/2012/11/12/quechua/feed/ 2 TALK. A. Kilgarriff: Getting to Know Your Corpus (2012/11/07) https://www.ehu.eus/ehusfera/ixa/2012/11/04/kilgarriff/ https://www.ehu.eus/ehusfera/ixa/2012/11/04/kilgarriff/#respond Sun, 04 Nov 2012 09:01:01 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1282

Speaker: Adam Kilgarriff (Lexical Computing Ltd., Brighton) Title: Getting to Know Your Corpus. Date: November 7, 2012, Wednesday Time: 16:00 Where: Computer Science Faculty, Room 3.2

Abstract

Corpora are not easy to get a handle on. The usual way of getting to grips with text is to read it, but corpora are mostly too [...]]]>

Speaker: Adam Kilgarriff (Lexical Computing Ltd., Brighton)
Title
: Getting to Know Your Corpus.
Date: November 7, 2012, Wednesday
Time: 16:00
Where
: Computer Science Faculty, Room 3.2

Abstract

Corpora are not easy to get a handle on. The usual way of getting to grips with text is to read it, but corpora are mostly too big to read (and not designed to be read). We show, with examples, how keyword lists (of one corpus vs: another) are a direct, practical and fascinating way to explore the characteristics of corpora, and of text types. Our method is to classify the top one hundred keywords of corpus1 vs: corpus2, and corpus2 vs: corpus1. This promptly reveals a range of contrasts between all the pairs of corpora we apply it to. We also present improved maths for keywords, and quantitative comparisons between corpora. All the methods discussed (and almost all of the corpora) are available in the Sketch Engine, a leading corpus query tool.

]]>
https://www.ehu.eus/ehusfera/ixa/2012/11/04/kilgarriff/feed/ 0
BerbaTek project’s results and demos. https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/ https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/#respond Fri, 10 Feb 2012 16:36:38 +0000 http://www.ehu.eus/ehusfera/ixa/?p=953 BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, [...]]]> BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, the technology centre Vicomtech and the foundation Tecnalia Research & Innovation.

 

Yesterday, the results of this project were presented in a press conference with representatives of the Basque Government. Throughout the BerbaTek project we have created several language tools, resources and some demos to show the potential of the integration of language, voice and multimedia technologies, when it comes to creating applications for the areas that make up the languages industry, in other words, for translation, contents and teaching. Three demos were presented in the press conference:

  • Automatic dubbing demo. The automatic dubbing of films is a difficult challenge for the moment (different voices, colloquial language, different speeds), but for some types of documentaries (single speaker, voice-over, coordination of the lips not necessary or unimportant ) we’ve done a demo that performs satisfactorily. Given a documentary in Spanish and its transcription (which can be obtained automatically by means of any of the dictation programs for Spanish in the market), VicomtechIK4’s temporal alignment technology creates a subtitles file, a transcription with time marks for the beginning and end of each sentence. Then, the Matxin MT system, developed by the IXA group, automatically translates the subtitles into Basque, and Aholab’s text-to-speech technology obtains the synchronized voice. We have successfully applied this demo to the single-speaker sections of the television program Teknopolis produced by Elhuyar. This demo can be seen at work here.
  • Semantic multimedia search engine for science and technology content. This search engine is based on WNTerm, an ontology specialized in science and technology wich was created by Elhuyar and IXA. It is a network where scientific and technological terms are semantically related to each other, with subclasses, synonyms, etc. A new augmented version will be presented next month.
  • Personal teacher for language learning. For the field of education, we have created a demo of a personal tutor for language learning. The tutor is a 3D avatar developed by Vicomtech-IK4 that shows emotions, can speak Basque and can understand what is said in Basque, using Aholab‘s technology. The tutor assists us in various tasks: we can do grammar exercises (verb conjugation, word inflection) and reading comprehension exercises (fill in gaps in a text, choosing from several options) that are created automatically from texts using technology from IXA; we can evaluate our pronunciation, with Aholab technology; or it helps us when writing texts, with inflection of words, writing of numbers or querying dictionaries, by means of technology from IXA and Elhuyar. By the moment this demo works in local mode, but it will beavailable online by next spring.

The pieces of news has been received by media today:

Further information about this project can be found at Berbatek project’s website.

]]>
https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/feed/ 0
Talk. Tegau Andrews. An overview of Welsh language technologies. (2011/11/02) https://www.ehu.eus/ehusfera/ixa/2011/10/28/794/ https://www.ehu.eus/ehusfera/ixa/2011/10/28/794/#comments Fri, 28 Oct 2011 12:52:23 +0000 http://www.ehu.eus/ehusfera/ixa/?p=794 Ixa Group has often collaborated with Bangor University in the development of language technology for less resourced languages. Mainly for Basque and Welsh and in the frame of SALTMIL. Briony Williams, Delith Prys and Gruff Prys are our Welsh contacts. Tegau Andrews from Bangor University will be with us next week, and we have programmed [...]]]> Ixa Group has often collaborated with Bangor University in the development of language technology for less resourced languages. Mainly for Basque and Welsh and in the frame of SALTMIL.
Briony Williams, Delith Prys and Gruff Prys are our Welsh contacts.
Tegau Andrews from Bangor University will be with us next week, and we have programmed this talk:

Speaker: Tegau Andrews (Bangor University, Wales)
Uned Technolegau Iaith  /  Language Technologies Unit
Prifysgol Bangor     /   Bangor University

When: November 2, Wednesday
Where: Room 3.2
Time: 15.00
Title: From terminology standardization systems to machine translation: An overview of Welsh language technologies

Abstract:

An endangered language will progress if its speakers can make use of electronic technology” so postulates Wales-based linguistics professor David Crystal (Language Death, 2000: 141). Welsh, spoken by 20.8% of the population of Wales (Census 2001), is classed a vulnerable language by UNESCO, yet it is the Welsh Government’s stated aim to make Wales a truly bilingual nation.

This talk will focus on the progress being made in developing language technologies for Welsh speakers. It will range over topics such as Welsh machine translation, computer-aided translation tools, text-to-speech technology, terminology portals and e-learning resources, and present an overview of the work being done at the Terminology and Language Technologies Unit at Bangor University. The aim of such work is to enable and encourage Welsh speakers to use electronic technology in their own language.

]]>
https://www.ehu.eus/ehusfera/ixa/2011/10/28/794/feed/ 1
Mitxelena Award for PhD theses: Maite Oronoz eta Larraitz Uria https://www.ehu.eus/ehusfera/ixa/2011/04/06/mitxelena-award_oronoz-uria/ https://www.ehu.eus/ehusfera/ixa/2011/04/06/mitxelena-award_oronoz-uria/#comments Wed, 06 Apr 2011 11:46:22 +0000 http://www.ehu.eus/ehusfera/ixa/?p=498  

Our colleague Maite Oronoz won last Monday the II. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Maite!

Besides, our colleague Larraitz Uria’s PhD thesis was also nominated for this award.

Both theses face language error detection. Maite’s thesis deals [...]]]>  

Our colleague Maite Oronoz won last Monday the II. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and  the University of the Basque Country.

CONGRATULATIONS Maite!

Besides, our colleague Larraitz Uria’s PhD thesis was also nominated for this award.

Both theses face language error detection. Maite’s thesis deals with it from a computational point of view, while Larraitz’ work does it from a linguistic perspective.

Title of Maite’s thesis: Euskarazko errore sintaktikoak detektatzeko eta zuzentzeko baliabideen garapena: datak, postposizio-lokuzioak eta komunztadura.
(Saroi, a system to detect and correct syntactic mistakes: dates, complex postpositions, and agreement.)
Maite’s supervisors: Arantza Diaz de Ilarraza and Koldo Gojenola
Title of Larraitz’ thesis: Euskarazko erroreen eta desbideratzeen analisirako lan-ingurunea. Determinatzaile-erroreen azterketa eta prozesamendua.
(A framework for the analysis of errors and deviations in Basque texts. Analysis and processing of errors on the use of determiners.
Larraitz’ supervisors: Igone Zabala and Montse Maritxalar
Publications:

]]> https://www.ehu.eus/ehusfera/ixa/2011/04/06/mitxelena-award_oronoz-uria/feed/ 1
40th anniversary of the Centro de Lingüística Aplicada in Santiago de Cuba. https://www.ehu.eus/ehusfera/ixa/2011/02/25/cla_40th_anniversary/ https://www.ehu.eus/ehusfera/ixa/2011/02/25/cla_40th_anniversary/#comments Fri, 25 Feb 2011 12:36:49 +0000 http://www.ehu.eus/ehusfera/ixa/?p=309

The title of his course was:

“Computational Morphology: trends, finite-states and open-source” (Evolución de la morfología computacional: nuevas posibilidades)

Foma, the application developed by Mans Hulden (University of Helsinki), was the [...]]]> In January Iñaki Alegria  participated in the XII Simposium de Comunicación Social organized by Centro de Lingüística Aplicada (CLA) in Santiago de Cuba.

The title of his course was:

“Computational Morphology: trends, finite-states and open-source”
(Evolución de la morfología computacional: nuevas posibilidades)

Foma, the application developed by Mans Hulden (University of Helsinki), was the main tool used in this tutorial.

As the CLA Centre is celebrating its 40th  anniversary this year , they have sent the Ixa Group a sculpture (see the picture) to commemorate our co-operation.

THANK YOU VERY MUCH!

And  CONGRATULATIONS  to Eloina, Julio Vitelio, Leonel, and all those compañeros that created this research centre and have been promoting it!

IXA group has been collaborating with CLA for 10 years. One of the fruits of this collaboration is the third edition of the Diccionario Básico Escolar (DBE). This dictionary is coded in XML and has been implemented using leXkit, an application developed by Ixa Group for dictionary managing.

Version in Basque of this new / Berri hau euskaraz

]]> https://www.ehu.eus/ehusfera/ixa/2011/02/25/cla_40th_anniversary/feed/ 1
Roser Morante’s talk: Modality and negation in natural language processing (2011/02/23) https://www.ehu.eus/ehusfera/ixa/2011/02/15/roser-morantes-talk-modality-and-negation-in-natural-language-processing-20110223/ https://www.ehu.eus/ehusfera/ixa/2011/02/15/roser-morantes-talk-modality-and-negation-in-natural-language-processing-20110223/#comments Tue, 15 Feb 2011 12:42:59 +0000 http://www.ehu.eus/ehusfera/ixa/?p=239

current trends and future directions Summary: Research on modality and negation focuses on [...]]]> Speaker: Roser Morante Senior researcher on the BIOGRAPH project led by Walter Daelemans.  CLiPS-Computational Linguistics research group University of Antwerp, Date: February 23, 2010 Time: 16:00 Where: Computer Science Faculty, Meeting room (batzar aretoa) .

Modality and negation in natural language processing: 

current trends and future directions

Summary:
Research on modality and negation focuses on finding subjective,
uncertain and counterfactual information in texts, be it in scientific
papers, product reviews, or opinions in blogs. This type of +research is
concerned with processing texts at the information level and aims at
deep text understanding.  Modality and negation are phenomena relevant
for all applications that are concerned with +some form of text
understanding, including text mining, sentiment analysis, recognizing
textual entailment, information extraction, text summarization, and
question answering. Hence, the adequate +modeling of these phenomena is
of crucial importance to the natural language processing (NLP) community
as a whole.

Whereas from a theoretical perspective, the study of modality has a long
tradition, only in the recent years have these topics attracted the
attention of NLP researchers. Mainly, the development of +sentiment
analysis techniques and the growing need of mining biomedical texts have
been the causes for the interest in these semantic aspects of language.
In this talk I will define modality and +negation from an NLP
perspective, I will motivate the need for processing these phenomena,
and I will summarize existing research on processing modality and
negation, touching on diverse aspects +ranging from task modelling to
feature visualization. Finally, I will speculate about future
developments in this research area.
]]> https://www.ehu.eus/ehusfera/ixa/2011/02/15/roser-morantes-talk-modality-and-negation-in-natural-language-processing-20110223/feed/ 1