Project – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Wed, 03 Dec 2014 15:17:48 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 BerbaTek project’s results and demos. https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/ https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/#respond Fri, 10 Feb 2012 16:36:38 +0000 http://www.ehu.eus/ehusfera/ixa/?p=953 BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, [...]]]> BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, the technology centre Vicomtech and the foundation Tecnalia Research & Innovation.

 

Yesterday, the results of this project were presented in a press conference with representatives of the Basque Government. Throughout the BerbaTek project we have created several language tools, resources and some demos to show the potential of the integration of language, voice and multimedia technologies, when it comes to creating applications for the areas that make up the languages industry, in other words, for translation, contents and teaching. Three demos were presented in the press conference:

  • Automatic dubbing demo. The automatic dubbing of films is a difficult challenge for the moment (different voices, colloquial language, different speeds), but for some types of documentaries (single speaker, voice-over, coordination of the lips not necessary or unimportant ) we’ve done a demo that performs satisfactorily. Given a documentary in Spanish and its transcription (which can be obtained automatically by means of any of the dictation programs for Spanish in the market), VicomtechIK4’s temporal alignment technology creates a subtitles file, a transcription with time marks for the beginning and end of each sentence. Then, the Matxin MT system, developed by the IXA group, automatically translates the subtitles into Basque, and Aholab’s text-to-speech technology obtains the synchronized voice. We have successfully applied this demo to the single-speaker sections of the television program Teknopolis produced by Elhuyar. This demo can be seen at work here.
  • Semantic multimedia search engine for science and technology content. This search engine is based on WNTerm, an ontology specialized in science and technology wich was created by Elhuyar and IXA. It is a network where scientific and technological terms are semantically related to each other, with subclasses, synonyms, etc. A new augmented version will be presented next month.
  • Personal teacher for language learning. For the field of education, we have created a demo of a personal tutor for language learning. The tutor is a 3D avatar developed by Vicomtech-IK4 that shows emotions, can speak Basque and can understand what is said in Basque, using Aholab‘s technology. The tutor assists us in various tasks: we can do grammar exercises (verb conjugation, word inflection) and reading comprehension exercises (fill in gaps in a text, choosing from several options) that are created automatically from texts using technology from IXA; we can evaluate our pronunciation, with Aholab technology; or it helps us when writing texts, with inflection of words, writing of numbers or querying dictionaries, by means of technology from IXA and Elhuyar. By the moment this demo works in local mode, but it will beavailable online by next spring.

The pieces of news has been received by media today:

Further information about this project can be found at Berbatek project’s website.

]]>
https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/feed/ 0
News from OPENMT-2 project https://www.ehu.eus/ehusfera/ixa/2011/04/10/openmt-2/ https://www.ehu.eus/ehusfera/ixa/2011/04/10/openmt-2/#respond Sun, 10 Apr 2011 16:28:00 +0000 http://www.ehu.eus/ehusfera/ixa/?p=404

 

Three pieces of news related to the OPENMT-2 project (2010-2012):

Gorka Labaka’s PhD thesis

In his PhD thesis (“EUSMT: Incorporating Linguistic Information to Statistical Machine Translation for Basque“) Labaka studied how Statistical Machine Translation (SMT) can handle the translation of Spanish into Basque, a morphologically rich and less-resourced language. He found two ways [...]]]>

 

Three pieces of news related to the OPENMT-2 project (2010-2012):

Gorka Labaka’s PhD thesis

In his PhD thesis (“EUSMT: Incorporating Linguistic Information to Statistical Machine Translation for Basque“)  Labaka studied how Statistical Machine Translation (SMT) can handle the translation of Spanish into Basque, a morphologically rich and less-resourced language. He found two ways to enhance the quality of the translation by using linguistic tools:

  • The use of morphological tools allowed him to perform translation at word-segments level, so avoiding spareness problems in corpora.
  • Complementarily, the  syntactic tools enabled the Spanish word-segments to be rearranged into their corresponding order in Basque. This reordering helped the SMT decoder to look for correct translations.

Recent research trends to focus more on statistical systems, and to ignore rule-based attempts. However, according to Gorka Labaka’s evaluation the RBMT and the state-of-the-art basic SMT systems work with a similar quality when translating into Basque. His improved SMT system based on segmentation and re-ordering outperforms both, the RBMT system and the basic SMT system, in more than 10% in HTER metric.  Besides, he calculated that a hypothetical oracle system would yield a result even 10% better; this oracle system should select the improved SMT output for 55% of the sentences, the RBMT output for other 41% of them, and EBMT for 4%. So he concluded that, at least in the case of morphologically rich languages with few resources, and hence few parallel corpora, the SMT approach is limited, and the RBMT approach should not be ignored. Currently, we are experimenting with hybrid architectures combining Matxin (rule-based) and EUSMT (statistical) translation-engines.

.

Visiting researcher Lluís Màrquez (NLPRG, Technical University of Catalonia, UPC)

With the aim of collaborating in this research line, Lluis Marquez, the main researcher in the UPC team within the OPENMT-2 project, is going to be in Donostia visiting the Ixa group until summer. He is an expert in integrating Machine Learning techniques in Language Technology. The first experiments on combining MT engines made by Gorka Labaka confirmed there is room for improvement. Now we want to find out the most suitable ways to do it.

.

.

Collaboration on Post-Editing with Basque Wikipedia (eu.wikipedia)

Within this project, a set of 60 long articles of the Spanish Wikipedia (adding up to more than 100.000 words) have been selected, and then translated into Basque language by using Matxin-Opentrad, our open-source rule-based machine translation system. Soon, in 2011 spring, a group of users of Basque Wikipedia will review them using an special interface we have adapted using OmegaT. They will correct the errors they find; this process is also known as post-editing. In this process, changes made by these users will be logged. The fixed articles will be included into Basque Wikipedia, but additionally the resulting post-editing logs will be used to enhance the machine translation process by manually improving the different modules of their MT system, or by implementing an automated statistical post-editing process that is expected to enhance the accuracy in the translation. (paper in Wikimania 2010)

]]> https://www.ehu.eus/ehusfera/ixa/2011/04/10/openmt-2/feed/ 0
A new European Project: PATHS https://www.ehu.eus/ehusfera/ixa/2011/02/04/paths_project/ https://www.ehu.eus/ehusfera/ixa/2011/02/04/paths_project/#comments Fri, 04 Feb 2011 13:42:28 +0000 http://www.ehu.eus/ehusfera/ixa/?p=184

IXA Group is participating with other 5 partners in a new European project: PATHS (2010-2012). The PATHS project (Personalised Access To cultural Heritage Spaces) primarily addresses objective ICT-2009.4.1: Digital Libraries and Digital Preservation. It relates to target outcome (d), adaptive cultural experiences, by creating personalised views of various forms of cultural expression, adapting these [...]]]>

IXA Group is participating with other 5 partners in a new European project: PATHS (2010-2012).
The PATHS project (Personalised Access To cultural Heritage Spaces)  primarily addresses objective ICT-2009.4.1: Digital Libraries and Digital Preservation. It relates to target outcome (d), adaptive cultural experiences, by creating personalised views of various forms of cultural expression, adapting these views to the background and cognitive context of the user and offering meaningful guidance about the interpretation of cultural works. PATHS will make important progress in this direction.

Europeana: Significant amounts of cultural heritage material are now available through online digital library portals. However, this vast amount of cultural heritage material can also be overwhelming for many users who are provided with little or no guidance on how to find and interpret this information.

The PATHS project will create a system that acts as an interactive personalised tour guide through existing digital library collections. The system will offer suggestions about items to look at and assist in their interpretation. Navigation will be based around the metaphor of a path through the collection. A path can be based around any theme, for example artist and media (“paintings by Picasso”), historic periods (“the Cold War”), places (“Venice”) and famous people (“Muhammad Ali”). Users will be able to construct their own paths or follow pre-defined ones.

The PATHS project will provide users with innovative ways to access and utilise the contents of digital libraries that enrich their experiences of these resources. This will be achieved by extending the state-of-the-art in user-driven information access and by applying language technologies to analyse and enrich online content. The project will take a user-centred approach to development to accommodate the needs, interests and preferences of different types of users.

These goals shall be realised through the following objectives :

  • Analysis of users’ requirements for access to Cultural Heritage collections
  • Organisation and enrichment of Cultural Heritage content for use within a navigation system
  • Implementation of a system for navigating Cultural Heritage resources
  • Techniques for providing personalised access to Cultural Heritage content
  • Porting the navigation system for use on mobile devices and Facebook
  • Evaluation with user groups and in field trials

Therefore, the project will research on the following areas:

  • Information Access: The project will develop a user-driven navigation through collections of information, gathering the users’ requirements and modeling it.

  • Educational Informatics: Adapting to individual learners in relation to being directed and being allowed the freedom to explore autonomously.
  • Content interpretation and enrichment: Representation and sharing of information about items, and identifying background information related to the items in cultural heritage collections

IXA Group will work mainly in content processing and enrichment. This means that content from Cultural Heritage sources will processed to a multi-layered network and augmented with additional information that will enrich the user’s experience. The additional information will include links between items in the collection and to external sources like Wikipedia or other relevant collections. The resulting multi-layered network will form the basis for the paths used to navigate the collection.

The PATHS consortium contains six partners.

]]> https://www.ehu.eus/ehusfera/ixa/2011/02/04/paths_project/feed/ 1
CLARIN Meeting in Donostia. May 2010 https://www.ehu.eus/ehusfera/ixa/2010/06/06/clarin-meeting-in-donostia-may-2010/ https://www.ehu.eus/ehusfera/ixa/2010/06/06/clarin-meeting-in-donostia-may-2010/#respond Sun, 06 Jun 2010 22:15:40 +0000 http://www.ehu.eus/ehusfera/ixa/2011/01/20/clarin-meeting-in-donostia-may-2010/ CLARIN meeting 10:00: Steven Krawer. CLARIN project Coordinator. 10:30: Nuria Bel (Pompeu Fabra University). Coordinator of CLARIN in Spain. 11:00: Coffee -break 11:30 -13:00 Presentation of Basque groups (I)

* Miriam Urkia. Euskaltzaindia * Miren Azkarate. Euskara institutua. UPV/EHU * Mikel Santesteban. Gogo Elebiduna. UPV/EHU * Antton Gurrutxaga eta Iñaki San Vicente. Elhuyar I+G

13:00: [...]]]> CLARIN meeting
CLARIN
10:00: Steven Krawer. CLARIN project Coordinator.
10:30: Nuria Bel (Pompeu Fabra University). Coordinator of CLARIN in Spain.
11:00: Coffee -break
11:30 -13:00 Presentation of Basque groups (I)

* Miriam Urkia. Euskaltzaindia
* Miren Azkarate. Euskara institutua. UPV/EHU
* Mikel Santesteban. Gogo Elebiduna. UPV/EHU
* Antton Gurrutxaga eta Iñaki San Vicente. Elhuyar I+G

13:00: Luncha
14:00 -15:00 Presentation of Basque groups (II)

* Igone Zabala. Euskal Filologia saila. UPV/EHU
* Ibon Aizpurua. Eleka.
* Jon Sánchez. Aholab. UPV/EHU.
* Kepa Sarasola. IXA Group. UPV/EHU

15:00-15:30 Conclusions
CLARIN_meeting_Donostia10

]]> https://www.ehu.eus/ehusfera/ixa/2010/06/06/clarin-meeting-in-donostia-may-2010/feed/ 0
IXA NLP has been invited to join Europeana v1.0 thematic network https://www.ehu.eus/ehusfera/ixa/2010/04/28/ixa-nlp-has-been-invited-to-join-europeana-v1-0-thematic-network-2/ https://www.ehu.eus/ehusfera/ixa/2010/04/28/ixa-nlp-has-been-invited-to-join-europeana-v1-0-thematic-network-2/#comments Wed, 28 Apr 2010 23:09:47 +0000 http://www.ehu.eus/ehusfera/ixa/?p=24 Europeana.eu (http://europeana.eu/) is a place for inspiration and ideas. There you can search through the cultural collections of Europe, connect to other user pathways and share your discoveries. Europeana.eu is funded by the European Commission and the member states. It links you to 6 million digital items.

* Images – paintings, drawings, maps, photos and [...]]]> Europeana.eu (http://europeana.eu/) is a place for inspiration and ideas. There you can search through the cultural collections of Europe, connect to other user pathways and share your discoveries. Europeana.eu is funded by the European Commission and the member states. It links you to 6 million digital items.

* Images – paintings, drawings, maps, photos and pictures of museum objects
* Texts – books, newspapers, letters, diaries and archival papers
* Sounds – music and spoken word from cylinders, tapes, discs and radio broadcasts
* Videos – films, newsreels and TV broadcasts

Some of these are world famous, others are hidden treasures from Europe’s

* museums and galleries
* archives
* libraries
* audio-visual collections

Europeana version 1.0 (http://version1.europeana.eu/web/europeana-project/home) is a 2.5 year project that will bring the Europeana.eu prototype to full service. In 2010 this project will implement a new version of Europeana with added functionality and access to over 10 million digital objects.

This project is the successor network to the EDLnet thematic network which created the EDL Foundation and the Europeana prototype. Following the launch of the user designed and driven prototype of Europeana the EDL Foundation wishes to use Europeana v1.0 to develop an operational service and solve key operational issues related to the implementation and functioning of the Europeana. Stakeholders, including the general public when the full operational service is offered, will be involved in Europeana and informed how they can contribute and access content. The work of Europeana v1.0 will include the development and implementation of all the necessary processes to create and run such an operation and a full scale business development operation to ensure a steady stream of content is made available. Additionally dissemination efforts to end users will be executed to ensure take up and continuous involvement of end users in order to achieve sustainability of such a service. Among the tools to enlarge user involvement will be the development of generic (web)services that allow others to re-use and re-purpose the data. Key infrastructure components and value-added services will be implemented by the Europeana Connect BPN. Content will come from existing sources such as The European Library and from the linked Europeana Travel, MIMO and Judaica proposals, as well the results of current eContentPlus projects: Athena, EFG and Europeana Local.


version1.europeana.eu/web/europeana-project/universities/

]]> https://www.ehu.eus/ehusfera/ixa/2010/04/28/ixa-nlp-has-been-invited-to-join-europeana-v1-0-thematic-network-2/feed/ 1