information extraction – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Wed, 03 Dec 2014 15:13:54 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 Koldo Mitxelena award for PhD theses to Arantxa Otegi https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/ https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/#comments Thu, 07 Feb 2013 17:39:59 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1393

Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Arantxa!

Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word [...]]]> III_Koldo_Mitxelena_Arantxa

Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Arantxa!

Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word sense disambiguation and semantic relatedness’.

The whole text is available here. This is the abstract:

Information retrieval (IR) aims at searching documents which satisfy the information need of an user. In that way, an IR system informs the user about relevant documents, that is those documents that contain the information they need as formulated in the query. Well-known search engines like Google and Yahoo are prime examples of IR systems.
A perfect IR system should retrieve only, and all, the relevant documents, rejecting the non-relevant ones. However, perfect retrieval systems do not exist. One of the main problems is the so-called vocabulary mismatch problem between query and documents: some documents might be relevant to the query even if the specific terms used differ substantially, or some documents might not be relevant to the query even they have some terms in common. The former is because several words or phrases can be used to express the same idea or item (synonymy). The latter is caused by ambiguity, where one word can have more than one interpretation depending on the context. Owing to these facts, if an IR system relies only on terms occurring in both the query and the document when it comes to deciding whether a document is relevant, it might be diffcult to fnd some of the interesting documents, and also to reject non-relevant documents. It seems fair to think that there will be more chances of successful retrieval if the meaning of the text is also taken into account.
Even though the vocabulary mismatch problem has been widely discussed in the literature from the early days of IR it remains unsolved, and most search engines just ignore it. This PhD dissertation explores whether natural language processing (NLP) can be used to alleviate this problem.
In a nutshell, we expand queries and documents making use of two NLP techniques, word sense disambiguation and semantic relatedness. For each of the mentioned techniques we propose an expansion strategy, in which we obtain synonyms and other related words for the words in the query and documents. We also present, for each case, a method to combine the expansions and original words effectively in an IR system. Furthermore, as the expansion technique we propose is useful for translating queries and documents, we show how a cross lingual information retrieval system could be improved using such an expansion technique.

Our extensive experiments on three datasets show that the expansion methods explored in this dissertation help overcome the mismatch problem, consequently improving the effectiveness of an IR system.

]]> https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/feed/ 1
Ixa Group at the kick-off meeting of the NewsReader Project https://www.ehu.eus/ehusfera/ixa/2013/02/05/newsreader-ixa/ https://www.ehu.eus/ehusfera/ixa/2013/02/05/newsreader-ixa/#comments Tue, 05 Feb 2013 15:50:53 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1359 Ixa Group is one of the five partners in the consortium of the NewsReader Project (EU FP7 programme, grant 316404, Jan.2013 – Dec.2015) that was presented on Wednesday 23 January at VU University Amsterdam. These are the five partners in the consortium:

VU University Amsterdam The University of the Basque Country Fondazione Bruno Kessler LexisNexis [...]]]>
Newsreader_logotxikiaIxa Group is one of the five partners in the consortium of the NewsReader Project  (EU FP7 programme, grant 316404,  Jan.2013 – Dec.2015) that was presented on Wednesday 23 January at VU University Amsterdam. These are the five partners in the consortium:

NewsReaderproblem

The volume of news data is enormous and expanding, covering billions of archived documents and millions of documents as daily streams, while at the same time getting more and more interconnected with knowledge provided elsewhere. Professional decision-makers that need to respond quickly to new developments faced with the problem that current solutions for consulting these archives and streams no longer work. Consequently, it becomes almost impossible to make well-informed decisions and professionals risk to be held liable for decisions based on incomplete, inaccurate and out-of-date information.

 

 

 

 

NewsReaderStorylinesNewsReader will develop a decision-support tool that allows professional decision-makers to explore these story lines using visual interfaces and interactions to exploit their explanatory power and their systematic structural implications. The goal is to extract what happened to whom, when, and where. Align, storing provenance, not discarding any information. Distinguish unfolding story lines. Assist financial decision support by explaining current events. Likewise, NewsReader can make predictions from the past on future events or explain new events and developments through the past. The tool will be tested by professional decision makers in the financial and economic area.

 

]]>
https://www.ehu.eus/ehusfera/ixa/2013/02/05/newsreader-ixa/feed/ 1
Talk. Dan Jurafsky. Extracting many kinds of meaning from text and speech. (2011/09/13) https://www.ehu.eus/ehusfera/ixa/2011/09/07/talk-dan-jurafsky-extracting-many-kinds-of-meaning-from-text-and-speech-20110913/ https://www.ehu.eus/ehusfera/ixa/2011/09/07/talk-dan-jurafsky-extracting-many-kinds-of-meaning-from-text-and-speech-20110913/#comments Wed, 07 Sep 2011 12:25:07 +0000 http://www.ehu.eus/ehusfera/ixa/?p=751 Speaker: Professor Dan Jurafsky (Stanford University). Date: September 13, 2011 Time: 16:00 Where: Computer Science Faculty Title: Extracting many kinds of meaning from text and speech. Abstract: Understanding natural language, while one of the oldest goals of artificial intelligence, is immensely difficult because language expresses so many kinds of meanings, embedded as it is in [...]]]> Speaker: Professor  Dan Jurafsky (Stanford University).
Date: September 13, 2011
Time: 16:00
Where
: Computer Science Faculty

Title: Extracting many kinds of meaning from text and speech.
Abstract:
Understanding natural language, while one of the oldest goals of artificial intelligence, is immensely difficult because language expresses so many kinds of meanings, embedded as it is in the rich social world of humans. In this talk I discuss work in our lab on extracting three kinds of meaning that link to the human world. We show how to learn world knowledge about events and their participants, `narrative schemes’ about how the world works, in a purely unsupervised way from large bodies of text. We show a new algorithm for the task of ‘coreference’: deciding when two mentions in a text refer to the same person or organization. Finally, we show how to automatically detect human interpersonal stances from speech and text cues in spoken conversation, detecting whether a speaker is friendly, awkward, or flirtatious. This talk describes joint work with Nate Chambers, Angel Chang, Heeyoung Lee, Chris Manning, Dan McFarland, Yves Peirsman, Karthik Raghunathan, Rajesh Ranganath, and Mihai Surdeanu.
BIO:
 Dan Jurafsky is Professor of Linguistics and Professor by Courtesy of Computer Science at Stanford University. Dan received a B.A in Linguistics in 1983 and a Ph.D. in Computer Science in 1992, both from the University of California at Berkeley, and also taught at the University of Colorado, Boulder. His research focuses on natural language understanding as well as the application of natural language processing to the behavioral and social sciences. Other research interests include the linguistics of Chinese and the linguistics of food. He is the recipient of a MacArthur Fellowship, and is the co-author with Jim Martin of the widely-used textbook “Speech and Language Processing“. It was the first book that included deep descriptions of both text and speech technology. Teachers and students of Language Technology, we know very well this nice book.

]]>
https://www.ehu.eus/ehusfera/ixa/2011/09/07/talk-dan-jurafsky-extracting-many-kinds-of-meaning-from-text-and-speech-20110913/feed/ 1
Invited talk: Computational Semantics and Pragmatics (Rodolfo Delmonte, 2011/01/17,18 https://www.ehu.eus/ehusfera/ixa/2011/01/14/delmonte2011/ https://www.ehu.eus/ehusfera/ixa/2011/01/14/delmonte2011/#comments Fri, 14 Jan 2011 22:20:34 +0000 http://www.ehu.eus/ehusfera/ixa/2011/01/20/invited-talk-computational-semantics-and-pragmatics-rodolfo-delmonte-2011011718/ Speaker: Rodolfo Delmonte, (Università Ca’ Foscari, Venice, Italy). Date: January 17 and 18, 2011 Time: 16:00 – 19:30 Where: Computer Science Faculty

ABSTRACT These two sessions cover some of the most important aspects of Computational Semantics and Pragmatics including: * Lexical Representations and Argument Structure * Parsing with constituency or dependency structure * Co-reference resolution [...]]]> Speaker: Rodolfo Delmonte, (Università Ca’ Foscari, Venice, Italy).
Date: January 17 and 18, 2011
Time: 16:00 – 19:30
Where: Computer Science Faculty

ABSTRACT
These two sessions cover some of the most important aspects of Computational Semantics and Pragmatics including:
* Lexical Representations and Argument Structure
* Parsing with constituency or dependency structure
* Co-reference resolution
* Underspecified arguments
* Argumentative structure, subjectivity, factuality and sentiment analysis
* Textual Entailment
The talks follow a linguistically motivated approach with the use of ontologies and similar resources to deal with co-reference or textual entailment tasks. The talks are accompanied by several applications and demonstrations.

SHORT BIO
Rodolfo Delmonte is Associate Professor of Computational Linguistics at the University of Venice where he is in charge of the corresponding course at BA, MA and Ph.D. level. Specialist in experimental phonetics and computational linguistics he presents his research work at major international conferences and publishes articles in international journals. He is referee for and publishes in Speech Communication, International Journal of Speech Technologies, Journal of Natural Language Engineering and international conferences every year. He has been invited speaker in a number of conferences, teacher at international schools, and invited professor in the last five years in Boulder, Colorado at the CLSR, in Besançon at the Centre Tesnière, in Dallas at UTD. Hot topics of his latest research include the following: Implicit entities and antecedents of omitted and underspecified arguments; Argumentative Analysis, Subjectivity, Factuality and Sentiment Analysis.

project.cgm.unive.it/delmonte.html

]]> https://www.ehu.eus/ehusfera/ixa/2011/01/14/delmonte2011/feed/ 2
Jon Patrick ‘s invited talk: ‘Medical NLP and Engineering. An NLP Workbench for it’ (2010/02/12) https://www.ehu.eus/ehusfera/ixa/2010/02/09/jon-patrick-s-invited-talk-medical-nlp-and-engineering-an-nlp-workbench-for-it-20100212/ https://www.ehu.eus/ehusfera/ixa/2010/02/09/jon-patrick-s-invited-talk-medical-nlp-and-engineering-an-nlp-workbench-for-it-20100212/#comments Tue, 09 Feb 2010 17:59:49 +0000 http://www.ehu.eus/ehusfera/ixa/?p=16 Speaker: Jon Patrick (University of Sydney) Date: February 12, 2010 Time: 16:00 Where: Computer Science Faculty, room 3.17 .

NLP systems for use in medical applications bring new problems not considered by classical methods. Broadly speaking medical texts have three genres: published papers, clinical reports, clinical notes.

Information Extraction (IE) and Questions Answering (AQ) are [...]]]> Speaker: Jon Patrick (University of Sydney)
Date: February 12, 2010
Time: 16:00
Where: Computer Science Faculty, room 3.17 .

NLP systems for use in medical applications bring new problems not considered by classical methods. Broadly speaking medical texts have three genres: published papers, clinical reports, clinical notes.

Information Extraction (IE) and Questions Answering (AQ) are the most common needs for NLP by clinical staf. Published papers are amenable to classical methods apart from needing coverage for many specialised terms. Clinical reports bring new problems due to the use of a specialised clinical terms, highly stylised content for scores, weights and measures and to a lesser degree a specialised grammatical structure. Clinical notes have these problems but many more, such as acronyms, neologisms, personal abbreviations, a high level of spelling errors due to mistyping and second language speakers, poor grammatical structure, multiple authors of the one document.

It is important to overcome these limitations in the text as they represent a large proportion of the content, up to 30%, and to reach the ultimate processing objective of achieving very high accuracy, say 95+% for information extraction, given that people’s lives depend on decisions made at the bedside using our tools.

We have designed a software architecture to tackle these problems whereby incrementally new knowledge discovered about the text is immediately fed back into the knowledge resources of the language processing system, so that it is continually improved at each phase of the processing.

Jon
www.hcsnet.edu.au/user/201

]]> https://www.ehu.eus/ehusfera/ixa/2010/02/09/jon-patrick-s-invited-talk-medical-nlp-and-engineering-an-nlp-workbench-for-it-20100212/feed/ 1