Yesterday, the results of this project were presented in a press conference with representatives of the Basque Government. Throughout the BerbaTek project we have created several language tools, resources and some demos to show the potential of the integration of language, voice and multimedia technologies, when it comes to creating applications for the areas that make up the languages industry, in other words, for translation, contents and teaching. Three demos were presented in the press conference:
The pieces of news has been received by media today:
Further information about this project can be found at Berbatek project’s website.
Three pieces of news related to the OPENMT-2 project (2010-2012):
Gorka Labaka’s PhD thesisIn his PhD thesis (“EUSMT: Incorporating Linguistic Information to Statistical Machine Translation for Basque“) Labaka studied how Statistical Machine Translation (SMT) can handle the translation of Spanish into Basque, a morphologically rich and less-resourced language. He found two ways [...]]]>
Three pieces of news related to the OPENMT-2 project (2010-2012):
In his PhD thesis (“EUSMT: Incorporating Linguistic Information to Statistical Machine Translation for Basque“) Labaka studied how Statistical Machine Translation (SMT) can handle the translation of Spanish into Basque, a morphologically rich and less-resourced language. He found two ways to enhance the quality of the translation by using linguistic tools:
Recent research trends to focus more on statistical systems, and to ignore rule-based attempts. However, according to Gorka Labaka’s evaluation the RBMT and the state-of-the-art basic SMT systems work with a similar quality when translating into Basque. His improved SMT system based on segmentation and re-ordering outperforms both, the RBMT system and the basic SMT system, in more than 10% in HTER metric. Besides, he calculated that a hypothetical oracle system would yield a result even 10% better; this oracle system should select the improved SMT output for 55% of the sentences, the RBMT output for other 41% of them, and EBMT for 4%. So he concluded that, at least in the case of morphologically rich languages with few resources, and hence few parallel corpora, the SMT approach is limited, and the RBMT approach should not be ignored. Currently, we are experimenting with hybrid architectures combining Matxin (rule-based) and EUSMT (statistical) translation-engines.
.
With the aim of collaborating in this research line, Lluis Marquez, the main researcher in the UPC team within the OPENMT-2 project, is going to be in Donostia visiting the Ixa group until summer. He is an expert in integrating Machine Learning techniques in Language Technology. The first experiments on combining MT engines made by Gorka Labaka confirmed there is room for improvement. Now we want to find out the most suitable ways to do it.
.
.
Within this project, a set of 60 long articles of the Spanish Wikipedia (adding up to more than 100.000 words) have been selected, and then translated into Basque language by using Matxin-Opentrad, our open-source rule-based machine translation system. Soon, in 2011 spring, a group of users of Basque Wikipedia will review them using an special interface we have adapted using OmegaT. They will correct the errors they find; this process is also known as post-editing. In this process, changes made by these users will be logged. The fixed articles will be included into Basque Wikipedia, but additionally the resulting post-editing logs will be used to enhance the machine translation process by manually improving the different modules of their MT system, or by implementing an automated statistical post-editing process that is expected to enhance the accuracy in the translation. (paper in Wikimania 2010)
]]>IXA Group is participating with other 5 partners in a new European project: PATHS (2010-2012). The PATHS project (Personalised Access To cultural Heritage Spaces) primarily addresses objective ICT-2009.4.1: Digital Libraries and Digital Preservation. It relates to target outcome (d), adaptive cultural experiences, by creating personalised views of various forms of cultural expression, adapting these [...]]]>
IXA Group is participating with other 5 partners in a new European project: PATHS (2010-2012).
The PATHS project (Personalised Access To cultural Heritage Spaces) primarily addresses objective ICT-2009.4.1: Digital Libraries and Digital Preservation. It relates to target outcome (d), adaptive cultural experiences, by creating personalised views of various forms of cultural expression, adapting these views to the background and cognitive context of the user and offering meaningful guidance about the interpretation of cultural works. PATHS will make important progress in this direction.
Europeana: Significant amounts of cultural heritage material are now available through online digital library portals. However, this vast amount of cultural heritage material can also be overwhelming for many users who are provided with little or no guidance on how to find and interpret this information.
The PATHS project will create a system that acts as an interactive personalised tour guide through existing digital library collections. The system will offer suggestions about items to look at and assist in their interpretation. Navigation will be based around the metaphor of a path through the collection. A path can be based around any theme, for example artist and media (“paintings by Picasso”), historic periods (“the Cold War”), places (“Venice”) and famous people (“Muhammad Ali”). Users will be able to construct their own paths or follow pre-defined ones.
The PATHS project will provide users with innovative ways to access and utilise the contents of digital libraries that enrich their experiences of these resources. This will be achieved by extending the state-of-the-art in user-driven information access and by applying language technologies to analyse and enrich online content. The project will take a user-centred approach to development to accommodate the needs, interests and preferences of different types of users.
These goals shall be realised through the following objectives :
Therefore, the project will research on the following areas:
Information Access: The project will develop a user-driven navigation through collections of information, gathering the users’ requirements and modeling it.
Content interpretation and enrichment: Representation and sharing of information about items, and identifying background information related to the items in cultural heritage collections
IXA Group will work mainly in content processing and enrichment. This means that content from Cultural Heritage sources will processed to a multi-layered network and augmented with additional information that will enrich the user’s experience. The additional information will include links between items in the collection and to external sources like Wikipedia or other relevant collections. The resulting multi-layered network will form the basis for the paths used to navigate the collection.
The PATHS consortium contains six partners.
* Miriam Urkia. Euskaltzaindia * Miren Azkarate. Euskara institutua. UPV/EHU * Mikel Santesteban. Gogo Elebiduna. UPV/EHU * Antton Gurrutxaga eta Iñaki San Vicente. Elhuyar I+G
13:00: [...]]]>
10:00: Steven Krawer. CLARIN project Coordinator.
10:30: Nuria Bel (Pompeu Fabra University). Coordinator of CLARIN in Spain.
11:00: Coffee -break
11:30 -13:00 Presentation of Basque groups (I)
* Miriam Urkia. Euskaltzaindia
* Miren Azkarate. Euskara institutua. UPV/EHU
* Mikel Santesteban. Gogo Elebiduna. UPV/EHU
* Antton Gurrutxaga eta Iñaki San Vicente. Elhuyar I+G
13:00: Luncha
14:00 -15:00 Presentation of Basque groups (II)
* Igone Zabala. Euskal Filologia saila. UPV/EHU
* Ibon Aizpurua. Eleka.
* Jon Sánchez. Aholab. UPV/EHU.
* Kepa Sarasola. IXA Group. UPV/EHU
15:00-15:30 Conclusions
* Images – paintings, drawings, maps, photos and [...]]]>
* Images – paintings, drawings, maps, photos and pictures of museum objects
* Texts – books, newspapers, letters, diaries and archival papers
* Sounds – music and spoken word from cylinders, tapes, discs and radio broadcasts
* Videos – films, newsreels and TV broadcasts
Some of these are world famous, others are hidden treasures from Europe’s
* museums and galleries
* archives
* libraries
* audio-visual collections
Europeana version 1.0 (http://version1.europeana.eu/web/europeana-project/home) is a 2.5 year project that will bring the Europeana.eu prototype to full service. In 2010 this project will implement a new version of Europeana with added functionality and access to over 10 million digital objects.
This project is the successor network to the EDLnet thematic network which created the EDL Foundation and the Europeana prototype. Following the launch of the user designed and driven prototype of Europeana the EDL Foundation wishes to use Europeana v1.0 to develop an operational service and solve key operational issues related to the implementation and functioning of the Europeana. Stakeholders, including the general public when the full operational service is offered, will be involved in Europeana and informed how they can contribute and access content. The work of Europeana v1.0 will include the development and implementation of all the necessary processes to create and run such an operation and a full scale business development operation to ensure a steady stream of content is made available. Additionally dissemination efforts to end users will be executed to ensure take up and continuous involvement of end users in order to achieve sustainability of such a service. Among the tools to enlarge user involvement will be the development of generic (web)services that allow others to re-use and re-purpose the data. Key infrastructure components and value-added services will be implemented by the Europeana Connect BPN. Content will come from existing sources such as The European Library and from the linked Europeana Travel, MIMO and Judaica proposals, as well the results of current eContentPlus projects: Athena, EFG and Europeana Local.
version1.europeana.eu/web/europeana-project/universities/