Semantics – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Mon, 04 Sep 2017 08:46:41 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 PhD Thesis: Computational Model for Semantic Textual Similarity (A. Gonzalez, 2017/07/07) https://www.ehu.eus/ehusfera/ixa/2017/07/06/phd-thesis-computational-model-for-semantic-textual-similarity-a-gonzalez-20150707/ https://www.ehu.eus/ehusfera/ixa/2017/07/06/phd-thesis-computational-model-for-semantic-textual-similarity-a-gonzalez-20150707/#comments Thu, 06 Jul 2017 17:43:30 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2530 Title: Computational Model for Semantic Textual Similarity Author: Aitor Gonzalez-Agirre Supervisors: German Rigau i Claramunt / Eneko Agirre Bengoa (Ixa Group) Date: July 7, 2017, Friday Time: 11:00 Where: Faculty of Informatics, Ada Lovelace Room (UPV/EHU)

Abstract:

The goal is to advance on computational models of meaning and their evaluation. We [...]]]>

Title: Computational Model for Semantic Textual Similarity
Author: Aitor Gonzalez-Agirre
Supervisors: German Rigau i Claramunt  / Eneko Agirre Bengoa (Ixa Group)
Date: July 7, 2017, Friday
Time: 11:00
Where:  Faculty of Informatics, Ada Lovelace Room (UPV/EHU)

Abstract:

The goal is to advance on computational models of meaning and their evaluation. We define two tasks: Semantic Textual Similarity (STS) and Typed Similarity.

STS aims to measure the degree of semantic equivalence between two sentences. We have collected pairs of sentences to construct datasets for STS, a total of 15,436 pairs of sentences, being by far the largest collection of data for STS.  We have designed, constructed and evaluated a new approach to combine knowledge-based and corpus-based methods using a cube.

Typed Similarity tries to identify the type of relation that holds between a pair of similar items in a digital library. Providing a reason why items are similar has applications in recommendation, personalization, and search. A range of types of similarity in this collection were identified and a set of 1,500 pairs of items from the collection were annotated using crowdsourcing.

We present systems that resolve the Typed Similarity task.

]]> https://www.ehu.eus/ehusfera/ixa/2017/07/06/phd-thesis-computational-model-for-semantic-textual-similarity-a-gonzalez-20150707/feed/ 1
Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-9) https://www.ehu.eus/ehusfera/ixa/2015/01/07/ninth-workshop-on-syntax-semantics-and-structure-in-statistical-translation-ssst-9/ https://www.ehu.eus/ehusfera/ixa/2015/01/07/ninth-workshop-on-syntax-semantics-and-structure-in-statistical-translation-ssst-9/#comments Wed, 07 Jan 2015 16:52:24 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2103 Eneko AGIRRE and Nora ARANBERRI, Ixa Group from University of the Basque Country, together with Dekai Wu, Hong Kong University of Science and Technology (HKUST) and Marine Carpuat, National Research Council (NRC) Canada, are the organizers of the SSST-9 – ”Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation” that takes place in Denver, [...]]]> Eneko AGIRRE and Nora ARANBERRI, Ixa Group from University of the Basque Country,  together with Dekai Wu, Hong Kong University of Science and Technology (HKUST) and Marine Carpuat, National Research Council (NRC) Canada, are the organizers of the SSST-9 – ”Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation” that takes place in Denver, Colorado, USA, in Jun, 4, 2015. (NAACL HLT 2015 / SIGMT / SIGLEX)

This Workshop seeks to bring together a large number of researchers working on diverse aspects of structure, semantics and representation in relation to statistical machine translation. Since its first edition in 2006, its program each year has comprised high-quality papers discussing current work spanning topics including: new grammatical models of translation; new learning methods for syntax- and semantics-based models; formal properties of synchronous/transduction grammars (hereafter S/TGs); discriminative training of models incorporating linguistic features; using S/TGs for semantics and generation; and syntax- and semantics-based evaluation of machine translation.

QTLeap_LogoQTLeap project Best Paper Award

This year SSST-9 will award a best paper award among papers which advance MT using semantics and deep language processing.

This award is sponsored by the European Union QTLeap project.

IXA talde is a partner in QTLeap (Quality Translation by Deep Language Engineering Approaches),  a project that is run by an European consortium with other seven partners: Bulgarian Academy of Sciences, Charles University in Prague, German Research Center for Artificial Intelligence, Higher Functions Lda., Humboldt University in Berlin, University of the Basque Country, University of Groningen and University of Lisbon. For more information and contact details please visit: qtleap.eu.

Organizers

Important Dates

Submission deadline for papers and extended abstracts: 8 Mar 2015
Notification to authors: 24 Mar 2015
Camera copy deadline: 3 Apr 2015

]]>
https://www.ehu.eus/ehusfera/ixa/2015/01/07/ninth-workshop-on-syntax-semantics-and-structure-in-statistical-translation-ssst-9/feed/ 1
The Basque WordNet semantic dictionary is a “public resource” now https://www.ehu.eus/ehusfera/ixa/2014/06/13/the-basque-wordnet-semantic-dictionary-is-a-public-resource-now/ https://www.ehu.eus/ehusfera/ixa/2014/06/13/the-basque-wordnet-semantic-dictionary-is-a-public-resource-now/#comments Fri, 13 Jun 2014 13:13:04 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2039 Machines need computing tools that are more powerful than conventional dictionaries for tasks like information extraction, disambiguation of word meanings, etc. This is in fact the function of the Euskal WordNet application —developed by the IXA Group (UPV/EHU)— which can already be consulted and downloaded free of charge.

This is the first Lexical Knowledge Base [...]]]> Machines need computing tools that are more powerful than conventional dictionaries for tasks like information extraction, disambiguation of word meanings, etc. This is in fact the function of the Euskal WordNet application —developed by the IXA Group (UPV/EHU)— which can already be consulted and downloaded free of charge.

This is the first Lexical Knowledge Base (LKB) developed for the Basque language: a “semantic dictionary” or “store” that compiles and organises lexical and semantic information. “It’s like a database, but the difference is that it not only gathers the usual information of a dictionary —the meanings of words and their corresponding definitions and examples—, it also links the concepts with each other,” pointed out Eneko Agirre, an IXA Group computer programmer.

If we look up the entry hatz (“finger”, “digit” or “toe” in Basque), the result is as follows: “Each of the five appendages at the end of human hands and feet.” That is what the term means. But apart from this information, we can get much more: the finger/toe is an appendage of the body; the thumb is a finger; fingers are part of the hand; hands, in turn, are part of the arm and fingers are used to touch objects, etc. In short: all the concepts are interrelated hierarchically. Every concept is also related to its equivalents in other languages: digit, hatz, dedo, dixito and dit.

EuskalWordnet_hatz_eleanitza

Consulting the word hatz in Basque WordNet.

This database is tremendously useful in various fields, like machine translation, information extraction, disambiguation of word meanings and for question-answer systems. In machine translation, for example, the system has to understand which word it is translating, a task for which it needs a “semantic dictionary” of this type. “For a quality translation, it is necessary to be able to distinguish the most appropriate meaning from among the various ones,” stressed Agirre.

“Our aim (within the framework of QTLeap European project) is to improve the quality of machine translations by using WordNet,” he pointed out.

Over the 2014-2015 academic year, the university Master’s degree in Language Analysis and Processing (LAP) that the IXA Group will be running at the UPV/EHU will be studying the Basque WordNet and other language technologies used to develop similar applications.

Master’s in Language Analysis and Processing (LAP)

The aim of the University Master’s in Language Analysis and Processing is to analyse language and to learn about the techniques and applications available for processing it with the help of the computer.

This Master’s has been organised by the UPV/EHU’s IXA Group and is geared towards anybody who combines linguistics and computing: philologists and linguistics experts, computing and telecommunications engineers, mathematicians, translators, etc. To apply for it, it is enough to be in possession of a University degree, have some experience and display some interest in the subject.

The Master’s will take one year and a half and the classes will be held at the Computing Faculty of the UPV/EHU-University of the Basque Country. It will be possible to spread it over two or three academic years (to cater for professionals who are working).

The pre-registration period is already open, and applications will be accepted until June 30. For further information on the Master’s, please check out http://ixa.si.ehu.es/master/.

]]> https://www.ehu.eus/ehusfera/ixa/2014/06/13/the-basque-wordnet-semantic-dictionary-is-a-public-resource-now/feed/ 1
Koldo Mitxelena award for PhD theses to Arantxa Otegi https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/ https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/#comments Thu, 07 Feb 2013 17:39:59 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1393

Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Arantxa!

Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word [...]]]> III_Koldo_Mitxelena_Arantxa

Our colleague Arantxa Otegi won last Janaury the III. Koldo MItxelena Award for PhD Theses organized by Euskaltzaindia (the Academy of Basque Language) and the University of the Basque Country.

CONGRATULATIONS Arantxa!

Congratulations to her supervisors (Xabier Arregi and Eneko Agirre).

The title of this thesis is ‘Expansion for information retrieval: contribution of word sense disambiguation and semantic relatedness’.

The whole text is available here. This is the abstract:

Information retrieval (IR) aims at searching documents which satisfy the information need of an user. In that way, an IR system informs the user about relevant documents, that is those documents that contain the information they need as formulated in the query. Well-known search engines like Google and Yahoo are prime examples of IR systems.
A perfect IR system should retrieve only, and all, the relevant documents, rejecting the non-relevant ones. However, perfect retrieval systems do not exist. One of the main problems is the so-called vocabulary mismatch problem between query and documents: some documents might be relevant to the query even if the specific terms used differ substantially, or some documents might not be relevant to the query even they have some terms in common. The former is because several words or phrases can be used to express the same idea or item (synonymy). The latter is caused by ambiguity, where one word can have more than one interpretation depending on the context. Owing to these facts, if an IR system relies only on terms occurring in both the query and the document when it comes to deciding whether a document is relevant, it might be diffcult to fnd some of the interesting documents, and also to reject non-relevant documents. It seems fair to think that there will be more chances of successful retrieval if the meaning of the text is also taken into account.
Even though the vocabulary mismatch problem has been widely discussed in the literature from the early days of IR it remains unsolved, and most search engines just ignore it. This PhD dissertation explores whether natural language processing (NLP) can be used to alleviate this problem.
In a nutshell, we expand queries and documents making use of two NLP techniques, word sense disambiguation and semantic relatedness. For each of the mentioned techniques we propose an expansion strategy, in which we obtain synonyms and other related words for the words in the query and documents. We also present, for each case, a method to combine the expansions and original words effectively in an IR system. Furthermore, as the expansion technique we propose is useful for translating queries and documents, we show how a cross lingual information retrieval system could be improved using such an expansion technique.

Our extensive experiments on three datasets show that the expansion methods explored in this dissertation help overcome the mismatch problem, consequently improving the effectiveness of an IR system.

]]> https://www.ehu.eus/ehusfera/ixa/2013/02/07/koldo-mitxelena-award-for-phd-theses-to-arantxa-otegi/feed/ 1
Talk. Martha Palmer: Beyond Shallow Semantics (2012-10-08) https://www.ehu.eus/ehusfera/ixa/2012/10/03/talk-martha-palmer/ https://www.ehu.eus/ehusfera/ixa/2012/10/03/talk-martha-palmer/#comments Wed, 03 Oct 2012 15:10:21 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1262 Speaker: Martha Palmer. Department of Linguistics, University of Colorado (AEB)

Title: Beyond Shallow Semantics. Date: October 8, 2012 Time: 16:00-19:00 Where: Computer Science Faculty, Room 3.2

 

Abstract

Shallow semantic analyzers, such as semantic role labelers and sense taggers, are increasing in accuracy and becoming commonplace. However, they only provide limited and local representations of [...]]]> Speaker:  Martha Palmer.  Department of Linguistics, University of Colorado (AEB)

Title: Beyond Shallow Semantics.
Date: October 8, 2012
Time: 16:00-19:00
Where
: Computer Science Faculty, Room 3.2

 

Abstract

Shallow semantic analyzers, such as semantic role labelers and sense taggers, are increasing in accuracy and becoming commonplace. However, they only provide limited and local representations of words and individual predicate-argument structures. This talk will address some of the current opportunities and challenges in producing deeper, richer representations of coherent eventualities. Available resources, such as VerbNet, that can assist in this process will also be discussed, as well as some of their limitations.

Speaker’s bio

She is a Full Professor at the University of Colorado with joint appointments in Linguistics and Computer Science and is an Institute of Cognitive Science Faculty Fellow. She recently won a Boulder Faculty Assembly 2010 Research Award. Her research has been focused on trying to capture elements of the meanings of words that can comprise automatic representations of complex sentences and documents. Supervised machine learning techniques rely on vast amounts of annotated training data so she and her students are engaged in providing data with word sense tags and semantic role labels for English, Chinese, Arabic, Hindi, and Urdu, funded by DARPA and NSF. They also train automatic sense taggers and semantic role labelers, and extract bilingual lexicons from parallel corpora.

A more recent focus is the application of these methods to biomedical journal articles and clinical notes, funded by NIH. She is a co-editor for the Journal of Natural Language Engineering and for LiLT, Linguistic Issues in Language Technology, and on the CLJ Editorial Board. She is a past President of the Association for Computational Linguistics, past Chair of SIGLEX and SIGHAN, and was the Director of the 2011 Linguistics Institute held in Boulder, Colorado.

]]> https://www.ehu.eus/ehusfera/ixa/2012/10/03/talk-martha-palmer/feed/ 1
*SEM conference: Lexical and computational semantics https://www.ehu.eus/ehusfera/ixa/2012/05/27/star_sem/ https://www.ehu.eus/ehusfera/ixa/2012/05/27/star_sem/#respond Sun, 27 May 2012 08:47:58 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1091

ACL special interest groups SIGLEX and SIGSEM are organising a joint conference on Lexical and computational semantics: *SEM

Eneko Agirre from the Ixa Group is the general chair in this conference.

The main goal of *SEM is to provide a stable fo of rum for the growing number of NLP researchers working on different [...]]]> Lexical and Computational Semantics

ACL special interest groups SIGLEX and SIGSEM are organising a joint conference on Lexical and computational semantics: *SEM

Eneko Agirre from the Ixa Group is the general chair in this conference.

The main goal of *SEM is to provide a stable fo of rum for the growing number of NLP researchers working on different aspects of semantic processing, which has been scattered over a large array of small workshops and conferences.

The first edition of *SEM will be a two-day conference collocated with the North American Chapter of the Association for Computational Linguistics and Human Language Technologies 2012 conference (NAACL HLT), which will take place in Montreal, Canada.

Important dates:
 30 March 2012, Paper due date
 23 April 2012, Notification of acceptance
  4 May 2012, Camera-ready deadline
7-8 June 2012, Conference

For further information see *SEM
]]> https://www.ehu.eus/ehusfera/ixa/2012/05/27/star_sem/feed/ 0
Talk. Daniele Pighin. Semantic Structures in Translation Ranking (2011/05/31) https://www.ehu.eus/ehusfera/ixa/2011/05/27/daniele-pighin/ https://www.ehu.eus/ehusfera/ixa/2011/05/27/daniele-pighin/#respond Fri, 27 May 2011 11:22:05 +0000 http://www.ehu.eus/ehusfera/ixa/?p=677
Speaker: Daniele Pighin
          NLPRG, TALP
          Technical University of Catalonia, UPC
Date: May 31, 2011
Time: 11:30
Where: Computer Science Faculty, Room 3.2

Title
   Automatic Projection of Semantic Structures:
      an Application to Pairwise Translation Ranking
 
Abstract
The ability to automatically assess the quality of translation
hypotheses is a key requirement towards the development of accurate and
dependable translation models. While it is largely agreed that proper
transfer of predicate-argument structures from source to target is a
very strong indicator of translation quality, especially in relation to
adequacy, the incorporation of this kind of information in the
Statistical Machine Translation (SMT) evaluation pipeline is still
limited to few and isolated cases.

We present a model for the inclusion of semantic role annotations in the
framework of confidence estimation for machine translation. The model
has several interesting properties:
   1) it only requires a linguistic processor on the (generally
well-formed) source side of the translation;
   2) it does not directly rely on properties of the translation model
(hence, it can be applied beyond phrase-based systems);
   3) it is inherently extendable to cope with different kinds of
sequential annotations, e.g., POS tags.
These features make it potentially appealing for system ranking,
translation re-ranking and user feedback evaluation. Preliminary
experiments in pairwise hypothesis ranking on five confidence estimation
benchmarks show that the model has the potential to capture salient
aspects of translation quality.
]]> https://www.ehu.eus/ehusfera/ixa/2011/05/27/daniele-pighin/feed/ 0 Invited talk: Computational Semantics and Pragmatics (Rodolfo Delmonte, 2011/01/17,18 https://www.ehu.eus/ehusfera/ixa/2011/01/14/delmonte2011/ https://www.ehu.eus/ehusfera/ixa/2011/01/14/delmonte2011/#comments Fri, 14 Jan 2011 22:20:34 +0000 http://www.ehu.eus/ehusfera/ixa/2011/01/20/invited-talk-computational-semantics-and-pragmatics-rodolfo-delmonte-2011011718/ Speaker: Rodolfo Delmonte, (Università Ca’ Foscari, Venice, Italy). Date: January 17 and 18, 2011 Time: 16:00 – 19:30 Where: Computer Science Faculty

ABSTRACT These two sessions cover some of the most important aspects of Computational Semantics and Pragmatics including: * Lexical Representations and Argument Structure * Parsing with constituency or dependency structure * Co-reference resolution [...]]]> Speaker: Rodolfo Delmonte, (Università Ca’ Foscari, Venice, Italy).
Date: January 17 and 18, 2011
Time: 16:00 – 19:30
Where: Computer Science Faculty

ABSTRACT
These two sessions cover some of the most important aspects of Computational Semantics and Pragmatics including:
* Lexical Representations and Argument Structure
* Parsing with constituency or dependency structure
* Co-reference resolution
* Underspecified arguments
* Argumentative structure, subjectivity, factuality and sentiment analysis
* Textual Entailment
The talks follow a linguistically motivated approach with the use of ontologies and similar resources to deal with co-reference or textual entailment tasks. The talks are accompanied by several applications and demonstrations.

SHORT BIO
Rodolfo Delmonte is Associate Professor of Computational Linguistics at the University of Venice where he is in charge of the corresponding course at BA, MA and Ph.D. level. Specialist in experimental phonetics and computational linguistics he presents his research work at major international conferences and publishes articles in international journals. He is referee for and publishes in Speech Communication, International Journal of Speech Technologies, Journal of Natural Language Engineering and international conferences every year. He has been invited speaker in a number of conferences, teacher at international schools, and invited professor in the last five years in Boulder, Colorado at the CLSR, in Besançon at the Centre Tesnière, in Dallas at UTD. Hot topics of his latest research include the following: Implicit entities and antecedents of omitted and underspecified arguments; Argumentative Analysis, Subjectivity, Factuality and Sentiment Analysis.

project.cgm.unive.it/delmonte.html

]]> https://www.ehu.eus/ehusfera/ixa/2011/01/14/delmonte2011/feed/ 2