talk – Ixa Group. Language Technology.

Seminar. First steps towards Quechua’s processing. (2012/11/15)

Kepa Sarasola — Mon, 12 Nov 2012 20:00:50 +0000

Hugo and Richard visiting Aholab in Bilbao.

Speakers: Hugo Quispe and Richard Castro (Universidad UNSAAC of Cusco, Peru),
………………Olatz Arregi, Xabier Artola eta Kepa Sarasola (Ixa Group)
Title: Primera aproximación al procesamiento automático del Quechua
………(First steps towards Quechua’s processing.)
Date: November 15, 2012, Thursday
Time: 16:00–17:00
Where: Computer Science Faculty, Room 3.2

Abstract

El Quechua (Runa Simi) como lengua oriunda de la cultura Inca en el Perú, es una familia de lenguas en Latinoamérica. La situación actual de la lengua, por factores como la occidentalización entre otros, ha hecho que el quechua sea una lengua vulnerable, en vías de extinción.

Un grupo de profesores e investigadores del grupo IXA de la UPV/EHU, en conjunto con la UNSAAC en Cusco, Perú, estamos realizando un trabajo para sentar las bases de lo que pretende ser el centro de ingeniería lingüística
de Cusco. Se trata de desarrollar los primeros recursos básicos y herramientas para al procesamiento automático del quechua. Los temas en los que estamos trabajando son: recopilación de un corpus textual, una base de datos léxica para la lengua quechua (BDLQ) y futuras herramientas derivadas de la misma, uso de la herramienta FOMA en el análisis morfológico y creación de un TTS como herramientas básicas para el tratamiento de la lengua.

De esta manera, se ha consolidado las bases de apoyo y trabajo en equipo entre las dos universidades, en bien de una lengua en situación crítica.

Hugo and Richard visiting Ixa Group in Donostia.

Quechua (Runa Simi) is a native South American language family and dialect cluster spoken primarily in the Andes of South America. It is the most widely spoken language family of the indigenous peoples of the Americas, with a total of probably some 8 to 10 million speaker. Like Basque Quechua remains alive but in last centuries suffered continuous regression. The region in which Quechua is spoken is becaming smaller and smaller. Similar with what happened with Basque, Quechua was not an official language, it has been out of educational systems, out of media, and out of industrial environments. Today Quechua holds co-official language status in Peru and Bolivia, even it is not regulated. But, although there have been several changes in the last years, Quechua is still associated with lack of education, stigmatized as uneducated, rural, or holding low economic and power resources, as it was Basque some years ago. Language technology may help to the Quechua speakers’ community and to scholars to built a standard. So opening a door to face Quechua’s future in the digital world. Corpus tools, lexical data-bases and spelling checkers have proven to be useful tools in that way for other languages such as Basque.
The group created by Prof. Juan Cruz in UNSAAC University in Cusco (Peru) has been collaborating with Ixa Group and Aholab since the beginning of 2012. Hugo Quispe and Richard Castro will present in this seminar the work they are doing on the definition of a lexical data-base and a TTS system (Text to Speech) for Quechua.

The group of Cusco (January 2012)

TALK. JM Torres: Text Summarization (2012/11/09)

Kepa Sarasola — Mon, 05 Nov 2012 16:44:05 +0000

This week we are going to listen a second talk, on Friday:

Speaker: Juan-Manuel Torres-Moreno
Head of research team Natural Language Processing (NLP)
at the Laboratoire Informatique d’Avignon (LIA)
Title: Text Summarization. Resumen automático de documentos: Algoritmos y tendencias futuras
Date: November 9, 2012, Friday
Time: 16:00
Where: Computer Science Faculty, Room 3.2

Abstract

El Resumen automático de textos es una disciplina del procesamiento de lenguaje natural (PLN), cuyo objetivo es comprimir los registros textuales. Este proceso de compresión implica una pérdida de información. Determinar la relevancia de la información retenida es una de las principales dificultades del proceso. Este seminario ofrece una visión histórica de los diferentes enfoques, desde el trabajo de HP Luhn en 1958 a las últimas investigaciones en PLN. La evaluación de los resúmenes -y difícil problema abierto- también será expuesto en sus enfoques manuales y automáticos. Varias aplicaciones de resumen automático de documentos se presentarán, así como resúmenes de documentos especializados (química orgánica y biomedicina). Se presentarán algoritmos de resumen mono y multidocumento, Resumen cross-lingüe y compresión automática de frases.

TALK. A. Kilgarriff: Getting to Know Your Corpus (2012/11/07)

Kepa Sarasola — Sun, 04 Nov 2012 09:01:01 +0000

Speaker: Adam Kilgarriff (Lexical Computing Ltd., Brighton)
Title: Getting to Know Your Corpus.
Date: November 7, 2012, Wednesday
Time: 16:00
Where: Computer Science Faculty, Room 3.2

Abstract

Corpora are not easy to get a handle on. The usual way of getting to grips with text is to read it, but corpora are mostly too big to read (and not designed to be read). We show, with examples, how keyword lists (of one corpus vs: another) are a direct, practical and fascinating way to explore the characteristics of corpora, and of text types. Our method is to classify the top one hundred keywords of corpus1 vs: corpus2, and corpus2 vs: corpus1. This promptly reveals a range of contrasts between all the pairs of corpora we apply it to. We also present improved maths for keywords, and quantitative comparisons between corpora. All the methods discussed (and almost all of the corpora) are available in the Sketch Engine, a leading corpus query tool.

Talk. Martha Palmer: Beyond Shallow Semantics (2012-10-08)

Kepa Sarasola — Wed, 03 Oct 2012 15:10:21 +0000

Speaker: Martha Palmer. Department of Linguistics, University of Colorado (AEB)

Title: Beyond Shallow Semantics.
Date: October 8, 2012
Time: 16:00-19:00
Where: Computer Science Faculty, Room 3.2

Abstract

Shallow semantic analyzers, such as semantic role labelers and sense taggers, are increasing in accuracy and becoming commonplace. However, they only provide limited and local representations of words and individual predicate-argument structures. This talk will address some of the current opportunities and challenges in producing deeper, richer representations of coherent eventualities. Available resources, such as VerbNet, that can assist in this process will also be discussed, as well as some of their limitations.

Speaker’s bio

She is a Full Professor at the University of Colorado with joint appointments in Linguistics and Computer Science and is an Institute of Cognitive Science Faculty Fellow. She recently won a Boulder Faculty Assembly 2010 Research Award. Her research has been focused on trying to capture elements of the meanings of words that can comprise automatic representations of complex sentences and documents. Supervised machine learning techniques rely on vast amounts of annotated training data so she and her students are engaged in providing data with word sense tags and semantic role labels for English, Chinese, Arabic, Hindi, and Urdu, funded by DARPA and NSF. They also train automatic sense taggers and semantic role labelers, and extract bilingual lexicons from parallel corpora.

A more recent focus is the application of these methods to biomedical journal articles and clinical notes, funded by NIH. She is a co-editor for the Journal of Natural Language Engineering and for LiLT, Linguistic Issues in Language Technology, and on the CLJ Editorial Board. She is a past President of the Association for Computational Linguistics, past Chair of SIGLEX and SIGHAN, and was the Director of the 2011 Linguistics Institute held in Boulder, Colorado.

Talk. Atro Voutilainen. Dependency treebank for Finnish (2011/06/08)

Kepa Sarasola — Tue, 31 May 2011 11:30:43 +0000

Speaker:Atro Voutilainen (University of Helsinki)
Date: June 8, 2011
Time: 11:30
Where: Computer Science Faculty, Room 3.2

Title: Building a dependency treebank and other LRs for Finnish

Abstract

Research infrastructure FIN-CLARIN
- LR web service for R&D
- corpora, language models, software, open sourc
- FIN-CLARIN project
FinnTreeBank
- user needs
- grammar definition corpus
- a parsebank with dependency syntactic annotation
Tagging and dependency parsing
- Finnish
- linguistic modelling
- tools, technologies
- modelling methods: experiments, comparisons

Talk. Daniele Pighin. Semantic Structures in Translation Ranking (2011/05/31)

Kepa Sarasola — Fri, 27 May 2011 11:22:05 +0000

Speaker: Daniele Pighin
          NLPRG, TALP
          Technical University of Catalonia, UPC
Date: May 31, 2011
Time: 11:30
Where: Computer Science Faculty, Room 3.2

Title
   Automatic Projection of Semantic Structures:
      an Application to Pairwise Translation Ranking
 
Abstract
The ability to automatically assess the quality of translation
hypotheses is a key requirement towards the development of accurate and
dependable translation models. While it is largely agreed that proper
transfer of predicate-argument structures from source to target is a
very strong indicator of translation quality, especially in relation to
adequacy, the incorporation of this kind of information in the
Statistical Machine Translation (SMT) evaluation pipeline is still
limited to few and isolated cases.

We present a model for the inclusion of semantic role annotations in the
framework of confidence estimation for machine translation. The model
has several interesting properties:
   1) it only requires a linguistic processor on the (generally
well-formed) source side of the translation;
   2) it does not directly rely on properties of the translation model
(hence, it can be applied beyond phrase-based systems);
   3) it is inherently extendable to cope with different kinds of
sequential annotations, e.g., POS tags.
These features make it potentially appealing for system ranking,
translation re-ranking and user feedback evaluation. Preliminary
experiments in pairwise hypothesis ranking on five confidence estimation
benchmarks show that the model has the potential to capture salient
aspects of translation quality.

Talk. Giovanni Semeraro. Information Retrieval and Information Filtering: two battlefields for NLP techniques (2011/05/06)

Kepa Sarasola — Tue, 03 May 2011 13:47:28 +0000

Speakers:Giovanni Semeraro, Pasquale Lops, Marco de Gemmis
         Dipartimento di Informatica
         Universita' di Bari
Date: May 6, 2011
Time: 16:00
Where: Computer Science Faculty, Room 3.2


  "Information Retrieval and Information Filtering:
    Two battlefields for NLP techniques"

 Part 1: Introduction to basic concepts on:
 - Information Retrieval Models: Boolean, Vector space
 - Information Filtering tecniques
 - Recommender Systems
 - Problems with classical information seeking strategies
 Speaker: Giovanni Semeraro
 Expected duration: 75 min.

 Part 2: Intelligent Information Access:
 - Semantic Indexing using external knowledge sources: WordNet, Wikipedia
 - Semantic Indexing for multilingual access
 Speaker: Pasquale Lops
 Expected duration: 45 min.

 - Knowledge Infusion (KI): creating a knowledge base from open knowledge sources
 - KI at work: solving a challenging language game
 - KI applications for recommender systems
 Speaker: Marco de Gemmis
 Expected duration: 45 min.

Roser Morante’s talk: Modality and negation in natural language processing (2011/02/23)

Kepa Sarasola — Tue, 15 Feb 2011 12:42:59 +0000

Speaker: Roser Morante
         Senior researcher on the BIOGRAPH project led by Walter Daelemans.
         CLiPS-Computational Linguistics research group
         University of Antwerp,
Date: February 23, 2010
Time: 16:00
Where: Computer Science Faculty, Meeting room (batzar aretoa) .

Modality and negation in natural language processing:

current trends and future directions

Summary:
Research on modality and negation focuses on finding subjective,
uncertain and counterfactual information in texts, be it in scientific
papers, product reviews, or opinions in blogs. This type of +research is
concerned with processing texts at the information level and aims at
deep text understanding.  Modality and negation are phenomena relevant
for all applications that are concerned with +some form of text
understanding, including text mining, sentiment analysis, recognizing
textual entailment, information extraction, text summarization, and
question answering. Hence, the adequate +modeling of these phenomena is
of crucial importance to the natural language processing (NLP) community
as a whole.

Whereas from a theoretical perspective, the study of modality has a long
tradition, only in the recent years have these topics attracted the
attention of NLP researchers. Mainly, the development of +sentiment
analysis techniques and the growing need of mining biomedical texts have
been the causes for the interest in these semantic aspects of language.
In this talk I will define modality and +negation from an NLP
perspective, I will motivate the need for processing these phenomena,
and I will summarize existing research on processing modality and
negation, touching on diverse aspects +ranging from task modelling to
feature visualization. Finally, I will speculate about future
developments in this research area.

Invited talk: Computational Semantics and Pragmatics (Rodolfo Delmonte, 2011/01/17,18

Kepa Sarasola — Fri, 14 Jan 2011 22:20:34 +0000

Speaker: Rodolfo Delmonte, (Università Ca’ Foscari, Venice, Italy).
Date: January 17 and 18, 2011
Time: 16:00 – 19:30
Where: Computer Science Faculty

ABSTRACT
These two sessions cover some of the most important aspects of Computational Semantics and Pragmatics including:
* Lexical Representations and Argument Structure
* Parsing with constituency or dependency structure
* Co-reference resolution
* Underspecified arguments
* Argumentative structure, subjectivity, factuality and sentiment analysis
* Textual Entailment
The talks follow a linguistically motivated approach with the use of ontologies and similar resources to deal with co-reference or textual entailment tasks. The talks are accompanied by several applications and demonstrations.

SHORT BIO
Rodolfo Delmonte is Associate Professor of Computational Linguistics at the University of Venice where he is in charge of the corresponding course at BA, MA and Ph.D. level. Specialist in experimental phonetics and computational linguistics he presents his research work at major international conferences and publishes articles in international journals. He is referee for and publishes in Speech Communication, International Journal of Speech Technologies, Journal of Natural Language Engineering and international conferences every year. He has been invited speaker in a number of conferences, teacher at international schools, and invited professor in the last five years in Boulder, Colorado at the CLSR, in Besançon at the Centre Tesnière, in Dallas at UTD. Hot topics of his latest research include the following: Implicit entities and antecedents of omitted and underspecified arguments; Argumentative Analysis, Subjectivity, Factuality and Sentiment Analysis.

project.cgm.unive.it/delmonte.html

Wauter Bosma: Contextual salience in query-based summarization (2010/10/22)

Kepa Sarasola — Fri, 15 Oct 2010 22:19:21 +0000

Speaker: Wauter Bosma (Vrieje Universiteit Amsterdam)
Date: Oct 22, 2010
Time: 15:00
Where: Computer Science Faculty, room 2.2 .

Wauter Bosma is currently working as a postdoc on the European KYOTO project (where Ixa group is another partner) at the Vrieje Universiteit Amsterdam . His main research interests are in the area of Natural Language Processing, and in particular text mining, terminology extraction and automatic summarization. In 2008 he received his PhD from the University of Twente on ‘Discourse-oriented summarization’.

Discourse theories claim that text gets meaning in context. Most summarization systems do not take advantage of this. They assess the relevance of each passage individually rather than modeling the way context affects the relevance of passages. In order to model relations in text, I developed a framework for graph-based summarization, so that the passages can be viewed in a broader context. The result is a summarization system which is more in line with discourse theory but still fully automatic. I evaluated the content selection performance of an implementation of the framework in different configurations. The system significantly outperforms a competitive baseline (and participant systems) on the DUC 2005 evaluation set.

talk – Ixa Group. Language Technology.

Seminar. First steps towards Quechua’s processing. (2012/11/15)

TALK. JM Torres: Text Summarization (2012/11/09)

TALK. A. Kilgarriff: Getting to Know Your Corpus (2012/11/07)

Talk. Martha Palmer: Beyond Shallow Semantics (2012-10-08)

Talk. Atro Voutilainen. Dependency treebank for Finnish (2011/06/08)

Talk. Daniele Pighin. Semantic Structures in Translation Ranking (2011/05/31)

Talk. Giovanni Semeraro. Information Retrieval and Information Filtering: two battlefields for NLP techniques (2011/05/06)

Roser Morante’s talk: Modality and negation in natural language processing (2011/02/23)

Modality and negation in natural language processing: current trends and future directions

Invited talk: Computational Semantics and Pragmatics (Rodolfo Delmonte, 2011/01/17,18

Wauter Bosma: Contextual salience in query-based summarization (2010/10/22)

Modality and negation in natural language processing:

current trends and future directions