collaborating – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Wed, 03 Dec 2014 15:17:18 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 Seminar: ‘The Lexikoaren Behatokia project’ & ‘Enriching EDBL with Hiztegi Batua’ (12/11/2013) https://www.ehu.eus/ehusfera/ixa/2013/12/13/seminar-the-lexikoaren-behatokia-project-plus-enriching-edbl-with-hiztegi-batua-12112013/ https://www.ehu.eus/ehusfera/ixa/2013/12/13/seminar-the-lexikoaren-behatokia-project-plus-enriching-edbl-with-hiztegi-batua-12112013/#comments Fri, 13 Dec 2013 09:01:11 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1866 Topics : The Lexikoaren Behatokia project (X. Artola) + Enriching EDBL with Hiztegi Batua (Gorka Labaka – Xabier Artola) Speakers: Xabier Artola and Gorka Labaka Day: December 11th 2013, Wednesday

The Basque language academy, Euskaltzaindia, launched in 2008 the Lexikoaren Behatokia project (“The Lexicon Observatory”), led by Andoni Sagarna. The objective of the project was [...]]]> Topics : The Lexikoaren Behatokia project (X. Artola) + Enriching EDBL with Hiztegi Batua (Gorka Labaka – Xabier Artola)
Speakers: Xabier Artola and Gorka Labaka
Day: December 11th 2013, Wednesday

The Basque language academy, Euskaltzaindia, launched in 2008 the Lexikoaren Behatokia project (“The Lexicon Observatory”), led by Andoni Sagarna. The objective of the project was to create a labelled and linguistically annotated corpus for research. In order to carry out the corpus, it was expected to use a variety of sources, mostly media, especially general interest ones. In late 2012, the corpus consisted of 26,565,924 words, and has been expanded year after year. Euskaltzaindia, the IXA research group, the Elhuyar foundation and UZEI collaborate on the project.

The Lexikoaren Behatokia corpus is available here.

On the other hand, the latest version of the prescriptive dictionary Hiztegi Batua has provided new entries for the Basque lexical database EDBL. This enriching process was explained in the seminar.

]]> https://www.ehu.eus/ehusfera/ixa/2013/12/13/seminar-the-lexikoaren-behatokia-project-plus-enriching-edbl-with-hiztegi-batua-12112013/feed/ 1
Seminar. First steps towards Quechua’s processing. (2012/11/15) https://www.ehu.eus/ehusfera/ixa/2012/11/12/quechua/ https://www.ehu.eus/ehusfera/ixa/2012/11/12/quechua/#comments Mon, 12 Nov 2012 20:00:50 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1314 Hugo and Richard visiting Aholab in Bilbao.

Speakers: Hugo Quispe and Richard Castro (Universidad UNSAAC of Cusco, Peru), ………………Olatz Arregi, Xabier Artola eta Kepa Sarasola (Ixa Group) Title: Primera aproximación al procesamiento automático del Quechua ………(First steps towards Quechua’s processing.) Date: November 15, 2012, Thursday Time: 16:00–17:00 Where: Computer Science Faculty, Room 3.2

[...]]]>

Hugo and Richard visiting Aholab in Bilbao.

Speakers: Hugo Quispe and  Richard Castro (Universidad UNSAAC of Cusco, Peru),
………………Olatz Arregi,  Xabier Artola eta  Kepa Sarasola (Ixa Group)
Title: Primera aproximación al procesamiento automático del Quechua
………(First steps towards Quechua’s processing.)
Date: November 15, 2012, Thursday
Time: 16:0017:00
Where
: Computer Science Faculty, Room 3.2

Abstract

El Quechua (Runa Simi) como lengua oriunda de la cultura Inca en el Perú, es una familia de lenguas en Latinoamérica. La situación actual de la lengua, por factores como la occidentalización entre otros, ha hecho que el quechua sea una lengua vulnerable, en vías de extinción.

Un grupo de profesores e investigadores del grupo IXA de la UPV/EHU, en conjunto con la UNSAAC en Cusco, Perú, estamos realizando un trabajo para sentar las bases de lo que pretende ser el centro de ingeniería lingüística
de Cusco. Se trata de desarrollar los primeros recursos básicos y herramientas para al procesamiento automático del quechua. Los temas en los que estamos trabajando son: recopilación de un corpus textual, una base de datos léxica para la lengua quechua (BDLQ) y futuras herramientas derivadas de la misma, uso de la herramienta FOMA en el análisis morfológico y creación de un TTS como herramientas básicas para el tratamiento de la lengua.

De esta manera, se ha consolidado las bases de apoyo y trabajo en equipo entre las dos universidades, en bien de una lengua en situación crítica.

Hugo and Richard visiting Ixa Group in Donostia.

Quechua (Runa Simi) is a native South American language family and dialect cluster spoken primarily in the Andes of South America. It is the most widely spoken language family of the indigenous peoples of the Americas, with a total of probably some 8 to 10 million speaker. Like Basque Quechua remains alive but in last centuries suffered continuous regression. The region in which Quechua is spoken is becaming smaller and smaller. Similar with what happened with Basque, Quechua was not an official language, it has been out of educational systems, out of media, and out of industrial environments. Today Quechua holds co-official language status in Peru and Bolivia, even it is not regulated. But, although there have been several changes in the last years, Quechua is still associated with lack of education, stigmatized as uneducated, rural, or holding low economic and power resources, as it was Basque some years ago. Language technology may help to the Quechua speakers’ community and to scholars to built a standard. So opening a door to face Quechua’s future in the digital world. Corpus tools, lexical data-bases and spelling checkers have proven to be useful tools in that way for other languages such as Basque.
The group created by Prof. Juan Cruz in UNSAAC University in Cusco (Peru) has been collaborating with Ixa Group and Aholab since the beginning of  2012. Hugo Quispe and  Richard Castro will present in this seminar the work they are doing on the definition of a lexical data-base and a TTS system (Text to Speech) for Quechua.

The group of Cusco (January 2012)

]]> https://www.ehu.eus/ehusfera/ixa/2012/11/12/quechua/feed/ 2 BerbaTek project’s results and demos. https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/ https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/#respond Fri, 10 Feb 2012 16:36:38 +0000 http://www.ehu.eus/ehusfera/ixa/?p=953 BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, [...]]]> BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, the technology centre Vicomtech and the foundation Tecnalia Research & Innovation.

 

Yesterday, the results of this project were presented in a press conference with representatives of the Basque Government. Throughout the BerbaTek project we have created several language tools, resources and some demos to show the potential of the integration of language, voice and multimedia technologies, when it comes to creating applications for the areas that make up the languages industry, in other words, for translation, contents and teaching. Three demos were presented in the press conference:

  • Automatic dubbing demo. The automatic dubbing of films is a difficult challenge for the moment (different voices, colloquial language, different speeds), but for some types of documentaries (single speaker, voice-over, coordination of the lips not necessary or unimportant ) we’ve done a demo that performs satisfactorily. Given a documentary in Spanish and its transcription (which can be obtained automatically by means of any of the dictation programs for Spanish in the market), VicomtechIK4’s temporal alignment technology creates a subtitles file, a transcription with time marks for the beginning and end of each sentence. Then, the Matxin MT system, developed by the IXA group, automatically translates the subtitles into Basque, and Aholab’s text-to-speech technology obtains the synchronized voice. We have successfully applied this demo to the single-speaker sections of the television program Teknopolis produced by Elhuyar. This demo can be seen at work here.
  • Semantic multimedia search engine for science and technology content. This search engine is based on WNTerm, an ontology specialized in science and technology wich was created by Elhuyar and IXA. It is a network where scientific and technological terms are semantically related to each other, with subclasses, synonyms, etc. A new augmented version will be presented next month.
  • Personal teacher for language learning. For the field of education, we have created a demo of a personal tutor for language learning. The tutor is a 3D avatar developed by Vicomtech-IK4 that shows emotions, can speak Basque and can understand what is said in Basque, using Aholab‘s technology. The tutor assists us in various tasks: we can do grammar exercises (verb conjugation, word inflection) and reading comprehension exercises (fill in gaps in a text, choosing from several options) that are created automatically from texts using technology from IXA; we can evaluate our pronunciation, with Aholab technology; or it helps us when writing texts, with inflection of words, writing of numbers or querying dictionaries, by means of technology from IXA and Elhuyar. By the moment this demo works in local mode, but it will beavailable online by next spring.

The pieces of news has been received by media today:

Further information about this project can be found at Berbatek project’s website.

]]>
https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/feed/ 0
40th anniversary of the Centro de Lingüística Aplicada in Santiago de Cuba. https://www.ehu.eus/ehusfera/ixa/2011/02/25/cla_40th_anniversary/ https://www.ehu.eus/ehusfera/ixa/2011/02/25/cla_40th_anniversary/#comments Fri, 25 Feb 2011 12:36:49 +0000 http://www.ehu.eus/ehusfera/ixa/?p=309

The title of his course was:

“Computational Morphology: trends, finite-states and open-source” (Evolución de la morfología computacional: nuevas posibilidades)

Foma, the application developed by Mans Hulden (University of Helsinki), was the [...]]]> In January Iñaki Alegria  participated in the XII Simposium de Comunicación Social organized by Centro de Lingüística Aplicada (CLA) in Santiago de Cuba.

The title of his course was:

“Computational Morphology: trends, finite-states and open-source”
(Evolución de la morfología computacional: nuevas posibilidades)

Foma, the application developed by Mans Hulden (University of Helsinki), was the main tool used in this tutorial.

As the CLA Centre is celebrating its 40th  anniversary this year , they have sent the Ixa Group a sculpture (see the picture) to commemorate our co-operation.

THANK YOU VERY MUCH!

And  CONGRATULATIONS  to Eloina, Julio Vitelio, Leonel, and all those compañeros that created this research centre and have been promoting it!

IXA group has been collaborating with CLA for 10 years. One of the fruits of this collaboration is the third edition of the Diccionario Básico Escolar (DBE). This dictionary is coded in XML and has been implemented using leXkit, an application developed by Ixa Group for dictionary managing.

Version in Basque of this new / Berri hau euskaraz

]]> https://www.ehu.eus/ehusfera/ixa/2011/02/25/cla_40th_anniversary/feed/ 1
Collaborating on language processing for Basque and Sami (Laponian) https://www.ehu.eus/ehusfera/ixa/2010/06/28/collaborating-on-language-processing-for-basque-and-sami-laponian/ https://www.ehu.eus/ehusfera/ixa/2010/06/28/collaborating-on-language-processing-for-basque-and-sami-laponian/#comments Mon, 28 Jun 2010 22:16:17 +0000 http://www.ehu.eus/ehusfera/ixa/2011/01/20/collaborating-on-language-processing-for-basque-and-sami-laponian/

Linda Wiechetek, a researcher from the University of Tromsø (Norway) is visiting the Ixa Group in Donostia in the period April to July in 2010. Her visit is founded by the NILS mobility project.

Why Sami and Basque? Why do we [...]]]> Researchers working on Basque and Sami (Laponian) are collaborating on Automatic Language Processing.

Linda Wiechetek, a researcher from the University of Tromsø (Norway) is visiting the Ixa Group in Donostia in the period April to July in 2010. Her visit is founded by the NILS mobility project. Linda

Why Sami and Basque? Why do we work with this unusual language pair?

Some of the reasons for that are:
1) Both are small languages,
2) With limited resources to face the use of language technology. (Sami is even lesser resourced than Basque now adays).
3) Sami and Basque morphologies are very rich and demand adequate tools such as our morphological transducers and syntactic disambiguation and analysis modules. Many of the better resourced languages with highly developed language Technology such as English, Spanish and French do not need such complex modules to create their basic tools.
4) There are clear syntactic parallels betwen Basque and Sami including the grammatical cases/postpositions causing morpho-syntactic ambiguity.

In this context we are collaborating on the following ways:
a) Use of semantic prototype features in Constraint Grammar for syntactic disambiguation.
b) Use of semantic features in Constraint Grammar for lexical/syntactic transfer in Machine Translation.
c) Use of information on verb-subcategorization for syntactic disambiguation.
d) Use of verb-subcategorization information in for lexical and syntactic transfer in Machine Translation.

The parser for Basque is not very accurate yet, not as accurate as English parsers. The Sami parser on the other hand gets good results in accuracy, but the use of valency is necessary for other tasks such as MT and QA.
With this collaboration between Basque and Sami researchers we aim to improve our NLP tools.

Besides of that, now Linda is able to speak some Basque, and we are learning some words in Sami.
That’s another way of collaboration 😉

Linda_IXA
giellatekno.uit.no/background/giellatekno3.pdf

]]> https://www.ehu.eus/ehusfera/ixa/2010/06/28/collaborating-on-language-processing-for-basque-and-sami-laponian/feed/ 1
IXA NLP has been invited to join Europeana v1.0 thematic network https://www.ehu.eus/ehusfera/ixa/2010/04/28/ixa-nlp-has-been-invited-to-join-europeana-v1-0-thematic-network-2/ https://www.ehu.eus/ehusfera/ixa/2010/04/28/ixa-nlp-has-been-invited-to-join-europeana-v1-0-thematic-network-2/#comments Wed, 28 Apr 2010 23:09:47 +0000 http://www.ehu.eus/ehusfera/ixa/?p=24 Europeana.eu (http://europeana.eu/) is a place for inspiration and ideas. There you can search through the cultural collections of Europe, connect to other user pathways and share your discoveries. Europeana.eu is funded by the European Commission and the member states. It links you to 6 million digital items.

* Images – paintings, drawings, maps, photos and [...]]]> Europeana.eu (http://europeana.eu/) is a place for inspiration and ideas. There you can search through the cultural collections of Europe, connect to other user pathways and share your discoveries. Europeana.eu is funded by the European Commission and the member states. It links you to 6 million digital items.

* Images – paintings, drawings, maps, photos and pictures of museum objects
* Texts – books, newspapers, letters, diaries and archival papers
* Sounds – music and spoken word from cylinders, tapes, discs and radio broadcasts
* Videos – films, newsreels and TV broadcasts

Some of these are world famous, others are hidden treasures from Europe’s

* museums and galleries
* archives
* libraries
* audio-visual collections

Europeana version 1.0 (http://version1.europeana.eu/web/europeana-project/home) is a 2.5 year project that will bring the Europeana.eu prototype to full service. In 2010 this project will implement a new version of Europeana with added functionality and access to over 10 million digital objects.

This project is the successor network to the EDLnet thematic network which created the EDL Foundation and the Europeana prototype. Following the launch of the user designed and driven prototype of Europeana the EDL Foundation wishes to use Europeana v1.0 to develop an operational service and solve key operational issues related to the implementation and functioning of the Europeana. Stakeholders, including the general public when the full operational service is offered, will be involved in Europeana and informed how they can contribute and access content. The work of Europeana v1.0 will include the development and implementation of all the necessary processes to create and run such an operation and a full scale business development operation to ensure a steady stream of content is made available. Additionally dissemination efforts to end users will be executed to ensure take up and continuous involvement of end users in order to achieve sustainability of such a service. Among the tools to enlarge user involvement will be the development of generic (web)services that allow others to re-use and re-purpose the data. Key infrastructure components and value-added services will be implemented by the Europeana Connect BPN. Content will come from existing sources such as The European Library and from the linked Europeana Travel, MIMO and Judaica proposals, as well the results of current eContentPlus projects: Athena, EFG and Europeana Local.


version1.europeana.eu/web/europeana-project/universities/

]]> https://www.ehu.eus/ehusfera/ixa/2010/04/28/ixa-nlp-has-been-invited-to-join-europeana-v1-0-thematic-network-2/feed/ 1