Information Retrieval – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Wed, 03 Dec 2014 15:13:54 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 Koldo Mitxelena award for PhD theses to Arantxa Otegi https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/ https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/#comments Thu, 07 Feb 2013 17:39:59 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1393

Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Arantxa!

Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word [...]]]> III_Koldo_Mitxelena_Arantxa

Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Arantxa!

Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word sense disambiguation and semantic relatedness’.

The whole text is available here. This is the abstract:

Information retrieval (IR) aims at searching documents which satisfy the information need of an user. In that way, an IR system informs the user about relevant documents, that is those documents that contain the information they need as formulated in the query. Well-known search engines like Google and Yahoo are prime examples of IR systems.
A perfect IR system should retrieve only, and all, the relevant documents, rejecting the non-relevant ones. However, perfect retrieval systems do not exist. One of the main problems is the so-called vocabulary mismatch problem between query and documents: some documents might be relevant to the query even if the specific terms used differ substantially, or some documents might not be relevant to the query even they have some terms in common. The former is because several words or phrases can be used to express the same idea or item (synonymy). The latter is caused by ambiguity, where one word can have more than one interpretation depending on the context. Owing to these facts, if an IR system relies only on terms occurring in both the query and the document when it comes to deciding whether a document is relevant, it might be diffcult to fnd some of the interesting documents, and also to reject non-relevant documents. It seems fair to think that there will be more chances of successful retrieval if the meaning of the text is also taken into account.
Even though the vocabulary mismatch problem has been widely discussed in the literature from the early days of IR it remains unsolved, and most search engines just ignore it. This PhD dissertation explores whether natural language processing (NLP) can be used to alleviate this problem.
In a nutshell, we expand queries and documents making use of two NLP techniques, word sense disambiguation and semantic relatedness. For each of the mentioned techniques we propose an expansion strategy, in which we obtain synonyms and other related words for the words in the query and documents. We also present, for each case, a method to combine the expansions and original words effectively in an IR system. Furthermore, as the expansion technique we propose is useful for translating queries and documents, we show how a cross lingual information retrieval system could be improved using such an expansion technique.

Our extensive experiments on three datasets show that the expansion methods explored in this dissertation help overcome the mismatch problem, consequently improving the effectiveness of an IR system.

]]> https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/feed/ 1
Ixa Group at the kick-off meeting of the NewsReader Project https://www.ehu.eus/ehusfera/ixa/2013/02/05/newsreader-ixa/ https://www.ehu.eus/ehusfera/ixa/2013/02/05/newsreader-ixa/#comments Tue, 05 Feb 2013 15:50:53 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1359 Ixa Group is one of the five partners in the consortium of the NewsReader Project (EU FP7 programme, grant 316404, Jan.2013 – Dec.2015) that was presented on Wednesday 23 January at VU University Amsterdam. These are the five partners in the consortium:

VU University Amsterdam The University of the Basque Country Fondazione Bruno Kessler LexisNexis [...]]]>
Newsreader_logotxikiaIxa Group is one of the five partners in the consortium of the NewsReader Project  (EU FP7 programme, grant 316404,  Jan.2013 – Dec.2015) that was presented on Wednesday 23 January at VU University Amsterdam. These are the five partners in the consortium:

NewsReaderproblem

The volume of news data is enormous and expanding, covering billions of archived documents and millions of documents as daily streams, while at the same time getting more and more interconnected with knowledge provided elsewhere. Professional decision-makers that need to respond quickly to new developments faced with the problem that current solutions for consulting these archives and streams no longer work. Consequently, it becomes almost impossible to make well-informed decisions and professionals risk to be held liable for decisions based on incomplete, inaccurate and out-of-date information.

 

 

 

 

NewsReaderStorylinesNewsReader will develop a decision-support tool that allows professional decision-makers to explore these story lines using visual interfaces and interactions to exploit their explanatory power and their systematic structural implications. The goal is to extract what happened to whom, when, and where. Align, storing provenance, not discarding any information. Distinguish unfolding story lines. Assist financial decision support by explaining current events. Likewise, NewsReader can make predictions from the past on future events or explain new events and developments through the past. The tool will be tested by professional decision makers in the financial and economic area.

 

]]>
https://www.ehu.eus/ehusfera/ixa/2013/02/05/newsreader-ixa/feed/ 1
BerbaTek project’s results and demos. https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/ https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/#respond Fri, 10 Feb 2012 16:36:38 +0000 http://www.ehu.eus/ehusfera/ixa/?p=953 BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, [...]]]> BerbaTek is a recently finished strategic research project with a duration of three years (2009-2011) funded by the Industry Department of the Basque Government.In order to carry out the project, a consortium was created which was made up of the Elhuyar Foundation, the IXA and Aholab research groups of the UPV/EHU-University of the Basque Country, the technology centre Vicomtech and the foundation Tecnalia Research & Innovation.

 

Yesterday, the results of this project were presented in a press conference with representatives of the Basque Government. Throughout the BerbaTek project we have created several language tools, resources and some demos to show the potential of the integration of language, voice and multimedia technologies, when it comes to creating applications for the areas that make up the languages industry, in other words, for translation, contents and teaching. Three demos were presented in the press conference:

  • Automatic dubbing demo. The automatic dubbing of films is a difficult challenge for the moment (different voices, colloquial language, different speeds), but for some types of documentaries (single speaker, voice-over, coordination of the lips not necessary or unimportant ) we’ve done a demo that performs satisfactorily. Given a documentary in Spanish and its transcription (which can be obtained automatically by means of any of the dictation programs for Spanish in the market), VicomtechIK4’s temporal alignment technology creates a subtitles file, a transcription with time marks for the beginning and end of each sentence. Then, the Matxin MT system, developed by the IXA group, automatically translates the subtitles into Basque, and Aholab’s text-to-speech technology obtains the synchronized voice. We have successfully applied this demo to the single-speaker sections of the television program Teknopolis produced by Elhuyar. This demo can be seen at work here.
  • Semantic multimedia search engine for science and technology content. This search engine is based on WNTerm, an ontology specialized in science and technology wich was created by Elhuyar and IXA. It is a network where scientific and technological terms are semantically related to each other, with subclasses, synonyms, etc. A new augmented version will be presented next month.
  • Personal teacher for language learning. For the field of education, we have created a demo of a personal tutor for language learning. The tutor is a 3D avatar developed by Vicomtech-IK4 that shows emotions, can speak Basque and can understand what is said in Basque, using Aholab‘s technology. The tutor assists us in various tasks: we can do grammar exercises (verb conjugation, word inflection) and reading comprehension exercises (fill in gaps in a text, choosing from several options) that are created automatically from texts using technology from IXA; we can evaluate our pronunciation, with Aholab technology; or it helps us when writing texts, with inflection of words, writing of numbers or querying dictionaries, by means of technology from IXA and Elhuyar. By the moment this demo works in local mode, but it will beavailable online by next spring.

The pieces of news has been received by media today:

Further information about this project can be found at Berbatek project’s website.

]]>
https://www.ehu.eus/ehusfera/ixa/2012/02/10/berbatek-projects-results-and-demos/feed/ 0
Talk. Giovanni Semeraro. Information Retrieval and Information Filtering: two battlefields for NLP techniques (2011/05/06) https://www.ehu.eus/ehusfera/ixa/2011/05/03/semeraro/ https://www.ehu.eus/ehusfera/ixa/2011/05/03/semeraro/#comments Tue, 03 May 2011 13:47:28 +0000 http://www.ehu.eus/ehusfera/ixa/?p=577 Speakers:Giovanni Semeraro, Pasquale Lops, Marco de Gemmis  Dipartimento di Informatica Universita' di Bari Date: May 6, 2011 Time: 16:00 Where: Computer Science Faculty, Room 3.2   "Information Retrieval and Information Filtering: Two battlefields for NLP techniques" Part 1: Introduction to basic concepts on: - Information Retrieval Models: Boolean, Vector space - Information Filtering tecniques - Recommender Systems - Problems with classical information seeking strategies Speaker: Giovanni Semeraro Expected duration: 75 min. Part 2: Intelligent Information Access: - Semantic Indexing using external knowledge sources: WordNet, Wikipedia - Semantic Indexing for multilingual access Speaker: Pasquale Lops Expected duration: 45 min. - Knowledge Infusion (KI): creating a knowledge base from open knowledge sources - KI at work: solving a challenging language game - KI applications for recommender systems Speaker: Marco de Gemmis Expected duration: 45 min. ]]> https://www.ehu.eus/ehusfera/ixa/2011/05/03/semeraro/feed/ 1