Discourse – Ixa Group. Language Technology. https://www.ehu.eus/ehusfera/ixa News from the Ixa Group in the University of the Basque Country Wed, 24 Jun 2015 08:38:34 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.4 Talk: Discourse Structure in Machine Translation Evaluation (L. Marquez, 2015/06/25) https://www.ehu.eus/ehusfera/ixa/2015/06/23/talk-discourse-structure-in-machine-translation-evaluation-l-marquez-20150615/ https://www.ehu.eus/ehusfera/ixa/2015/06/23/talk-discourse-structure-in-machine-translation-evaluation-l-marquez-20150615/#respond Tue, 23 Jun 2015 14:19:03 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2194 Speaker: Lluis Màrquez …………….Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI) Data: June 25th 2015, Thursday Time: 12:00 Room: 3.2 room. Faculty of Informatics (UPV/EHU) Title: “Discourse Structure in Machine Translation Evaluation”

Abstract:

In this talk I will describe our research at the Arabic Language Technologies group from the [...]]]> Speaker: Lluis Màrquez
…………….Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI)
Data: June 25th 2015, Thursday
Time: 12:00
Room: 3.2 room. Faculty of Informatics (UPV/EHU)
Title:Discourse Structure in Machine Translation Evaluation

Abstract:

In this talk I will describe our research at the Arabic Language Technologies group from the Qatar Computing Research Institute on applying discourse-level information to automatic machine translation (MT) evaluation.
I will start by describing some variants of a discourse-aware similarity measure, which uses the `all-subtree’ convolution kernel to compare discourse parse trees in accordance with the Rhetorical Structure Theory. Then, I will show that these measures help improve a number of already existing MT evaluation metrics both at the segment and at the system level by increasing the correlation with human judgements. This indicates that discourse information is complementary to the state-of-the-art metrics, and thus could be taken into account in the development of richer evaluation measures.
In a second part I will present a strong and robust evaluation measure combining the discourse-based similarity with other metrics from the Asiya MT evaluation toolkit, and tuning the weights of the combination on actual human judgments. Experiments on the WMT12, WMT13, and WMT14 metrics shared task datasets show correlation with human judgments that outperforms those of the state-of-the-art, both at the segment and at the system level with very consistent results across language pairs.
In the final part of the talk, I will introduce two preliminary attempts of learning metrics from finer-grained features for pairwise quality comparison. In the first one, we use preference reranking with kernels to learn from tree structured representation. In the second one, we use a Neural Network architecture to learn from a distributed representation of syntax and semantics. Both frameworks are developed with the spirit of being general and extensible from MT evaluation to quality estimation and machine translation.

Short bio:

Principal Scientist at the Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI) since 2013. Previously, Associate Professor at the Technical University of Catalonia (UPC, 2000-2013). He holds a PhD. in Computer Science from UPC (1999). His research focuses on Machine Learning methods for Natural Language structure prediction problems, including syntactic and semantic parsing. He works on applications in statistical machine translation and its evaluation, and question answering in community forums. He has 120+ papers in Natural Language Processing and Machine Learning journals and conferences. He has been General and Program Co-chair of major conferences in the area (EMNLP, EACL, CoNLL, EAMT, etc.), and held several organizational roles in ACL and EMNLP too. He was co-organizer of various international evaluation tasks at Senseval/SemEval (2004, 2007, 2010, 2015) and CoNLL shared tasks (2004-2005, 2008-2009). Secretary and President of the ACL SIG on Natural Language Learning (SIGNLL) in the period 2007-2011, he currently serves as President of the European Chapter of the ACL (2015-2017). He has been Guest Editor of special issues at Computational Linguistics, LRE, JNLE, and JAIR in the period (2007-2015). He has participated in 18 national and EU research projects, acting as the principal site researcher in 10 of them.

]]> https://www.ehu.eus/ehusfera/ixa/2015/06/23/talk-discourse-structure-in-machine-translation-evaluation-l-marquez-20150615/feed/ 0
Talk: Discourse organization studies at Universidade Estadual de Maringá (J. Desiderato, 2015/06/04) https://www.ehu.eus/ehusfera/ixa/2015/06/03/talk-discourse-organization-studies-at-universidade-estadual-de-maringa-j-desiderato-20150604/ https://www.ehu.eus/ehusfera/ixa/2015/06/03/talk-discourse-organization-studies-at-universidade-estadual-de-maringa-j-desiderato-20150604/#comments Wed, 03 Jun 2015 18:12:59 +0000 http://www.ehu.eus/ehusfera/ixa/?p=2172 Speaker: Juliano Desiderato …………….Universidade Estadual de Maringa, DTL, (Brasil) Data: June 4th 2015, Thursday Time: 15:00 Room: 3.2 room. Faculty of Informatics (UPV/EHU) Title: “Discourse organization studies at Universidade Estadual de Maringá”

Abstract:

The talk aims at presenting the discourse organization studies conducted by the Group of Functionalist Investigations at Universidade [...]]]>

Speaker:  Juliano Desiderato
…………….Universidade Estadual de Maringa, DTL, (Brasil)
Data: June 4th 2015, Thursday
Time: 15:00
Room: 3.2 room. Faculty of Informatics (UPV/EHU)
Title: “Discourse organization studies at Universidade Estadual de Maringá”

Abstract:

The talk aims at presenting the discourse organization studies conducted by the Group of Functionalist Investigations at Universidade Estadual de Maringá (Brazil). The investigations concern both spoken and written Brazilian Portuguese and describe the rhetorical structure of diverse genres. The programme of the talk is the following:Spoken language

  • Spoken language strategies
  • Discourse markers
  • Rhetorical structure of undergraduate lectures
  • Rhetorical structure of biblical sermons
  • Rhetorical structure of narratives
  • Rhetorical structure of personal experience reports
  • Rhetorical structure of argumentative answers

Short bio:

Juliano Desiderato Antonio is associate professor at Universidade Estadual de Maringá, Brazil. The theme of his master’s degree dissertation (1998) was preferred argument structure in narratives in Brazilian Portuguese. His doctorate thesis (2004) concerns the rhetorical structure of narratives in Brazilian Portuguese and its manifestation via hypotaxis. He conducted a post-doctorate investigation (2011) to characterize some Functional Discourse Grammar’s parameters as signals of hypotactic rhetorical relations. At UEM he teaches Linguistics for graduation classes and Functional Linguistics for post-graduation classes, where he also advises masters dissertations and doctorate theses. From a functional point of view his investigations approach discourse structure of diverse genre, spoken language and clause combining and grammar and discourse.

]]> https://www.ehu.eus/ehusfera/ixa/2015/06/03/talk-discourse-organization-studies-at-universidade-estadual-de-maringa-j-desiderato-20150604/feed/ 1
Talk: Text summarization using discourse knowledge. Text simplification and co-reference (T. Pardo) https://www.ehu.eus/ehusfera/ixa/2014/02/25/talk-text-summarization-using-discourse-knowledge-text-simplification-and-co-reference-t-pardo-feb-28/ https://www.ehu.eus/ehusfera/ixa/2014/02/25/talk-text-summarization-using-discourse-knowledge-text-simplification-and-co-reference-t-pardo-feb-28/#respond Tue, 25 Feb 2014 17:22:33 +0000 http://www.ehu.eus/ehusfera/ixa/?p=1956 Speaker: Thiago Pardo Data: February 28th 2014, Friday Time: 10:30 Room: 3.1 computer Science Faculty (UPV/EHU)

Title: “Text summarization using discourse knowledge. Text simplification and co-reference”

Abstract:

Thiago A.S. Pardo has also developped many systems for text summarization. For example, the following:

Summarization extension to Google Chrome – extension for on-line news [...]]]>
Speaker: Thiago Pardo
Data: February 28th 2014, Friday
Time: 10:30
Room: 3.1 computer Science Faculty (UPV/EHU)

Title: “Text summarization using discourse knowledge. Text simplification and co-reference”

Abstract:

Thiago A.S. Pardo has also developped many systems for text summarization. For example, the following:

  • Summarization extension to Google Chrome – extension for on-line news summarization, based on RSumm system
  • TextTiling for Portuguese – topical segmentation tool adapted to news texts in Brazilian Portuguese, based on the work of Hearst (1997)
  • CSTSumm – a multi-document summarizer based on CST information (see README.txt in the rar file)
  • CSTNews – a corpus with 50 clusters of news texts – in Portuguese – with their multi-document summaries, as well as several discourse and semantic annotations
  • TeMário 2006 – 150 news texts and the corresponding human summaries, which complement the original TeMário corpus, resulting in a corpus of 250 texts for summarization purposes
  • DMSumm – Discourse Modeling SUMMarizer
  • NeuralSumm – NEURAL network for SUMMarization (for scientific texts) – with tools for training the system with new data, if necessary
  • GistSumm – GIST SUMMarizer
]]>
https://www.ehu.eus/ehusfera/ixa/2014/02/25/talk-text-summarization-using-discourse-knowledge-text-simplification-and-co-reference-t-pardo-feb-28/feed/ 0
Talk: V. Kordoni. Automated Annotation and Acquisition of Linguistic Knowledge (2011/11/25) https://www.ehu.eus/ehusfera/ixa/2011/11/15/talk-v-kordoni/ https://www.ehu.eus/ehusfera/ixa/2011/11/15/talk-v-kordoni/#respond Tue, 15 Nov 2011 12:02:04 +0000 http://www.ehu.eus/ehusfera/ixa/?p=871 Speaker:Valia Kordoni (LT-Lab DFKI GmbH & Dept. of Computational Linguistics, Saarland University) Title: Automated Annotation and Acquisition of Linguistic Knowledge for Efficient Multilingual Grammar Engineering. Date: November 25, 2011 Time: 16:00-18:00 Where: Computer Science Faculty, Room 3.2

 

 

Abstract

In this talk, I mainly deal with automated acquisition of linguistic knowledge as [...]]]>
Speaker:Valia Kordoni (LT-Lab DFKI GmbH & Dept. of Computational Linguistics, Saarland University)
Title: Automated Annotation and Acquisition of Linguistic Knowledge for Efficient Multilingual Grammar Engineering. Date: November 25, 2011
Time: 16:00-18:00
Where
: Computer Science Faculty, Room 3.2

 

 


Abstract

In this talk, I mainly deal with automated acquisition of linguistic knowledge as a means of enhancing robustness of lexicalised grammars for real life applications. The case study I focus on in the best part of this talk is Multiword Expressions (henceforward MWEs). Specifically, in the first part of the talk I am taking a closer look at the linguistic properties of MWEs, in particular, their lexical, syntactic, as well as semantic characteristics. The term Multiword Expressions has been used to describe expressions for which the syntactic or semantic properties of the whole expression cannot be derived from its parts (cf., Sag et al., 2002), including a large number of related but distinct phenomena, such as phrasal verbs (e.g., “come along”), nominal compounds (e.g., “frying pan”), institutionalised phrases (e.g., “breadand butter”), and many others. Jackendoff (1997) estimates the number of MWEs in a speaker’s lexicon to be comparable to the number of single words.
However, due to their heterogeneous characteristics, MWEs present a tough challenge for both linguistic and computational work (cf., Sag et al., 2002).
For instance, some MWEs are fixed, and do not present internal variation, such as “ad hoc”, while others allow different degrees of internal variability and modification, such as “spill beans” (“spill several/musical/mountains of beans”). With the observations about the linguistic properties of MWEs at hand, I turn in the second part of the talk to methods for the automated acquisition of these properties for robust grammar engineering. To this effect, I first investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing various statistical measures, a procedure which leads to extremely
interesting conclusions. I then investigate the influence of the size and quality of different corpora, using the BNC and the Web search engines Google and Yahoo. I conclude that, in terms of language usage, web generated corpora are fairly similar to more carefully built corpora, like the BNC, indicating that the lack of control and balance of these corpora are probably compensated by their size.
Then, I show a qualitative evaluation of the results of automatically adding extracted MWEs to existing linguistic resources. To this effect, I first discuss two main approaches commonly employed in NLP for treating MWEs: the words-with-spaces approach which models an MWE as a single lexical entry and it can adequately capture fixed MWEs like “by and large”, and compositional approaches which treat MWEs by general and compositional methods of linguistic analysis, being able to capture more syntactically flexible MWEs, like “rock boat”, which cannot be satisfactorily captured by a wordswith-spaces approach, since this would require lexical entries to be added for all the possible variations of an MWE (e.g., “rock/rocks/rocking this/that/his…boat”). On this basis, I argue that the process of the automatic addition of extracted MWEs to existing linguistic resources improves qualitatively, if a more compositional approach to grammar/lexicon automated
extension is adopted.
Finally, I also propose that the methods developed for the acquisition of linguistic knowledge in the case of the English MWEs can be tuned to enhance robustness of lexicalised grammars for languages with richer morphology and freer word order, as is the case of German, and can benefit from gold standard syntactically and semantically annotated corpora, for the (semi-automated) development of which I am briefly
showing a very simple statistical ranking model which significantly improves treebanking efficiency by prompting human annotators to the most relevant linguistic annotation decisions.
]]>
https://www.ehu.eus/ehusfera/ixa/2011/11/15/talk-v-kordoni/feed/ 0
Roser Morante’s talk: Modality and negation in natural language processing (2011/02/23) https://www.ehu.eus/ehusfera/ixa/2011/02/15/roser-morantes-talk-modality-and-negation-in-natural-language-processing-20110223/ https://www.ehu.eus/ehusfera/ixa/2011/02/15/roser-morantes-talk-modality-and-negation-in-natural-language-processing-20110223/#comments Tue, 15 Feb 2011 12:42:59 +0000 http://www.ehu.eus/ehusfera/ixa/?p=239

current trends and future directions Summary: Research on modality and negation focuses on [...]]]> Speaker: Roser Morante Senior researcher on the BIOGRAPH project led by Walter Daelemans.  CLiPS-Computational Linguistics research group University of Antwerp, Date: February 23, 2010 Time: 16:00 Where: Computer Science Faculty, Meeting room (batzar aretoa) .

Modality and negation in natural language processing: 

current trends and future directions

Summary:
Research on modality and negation focuses on finding subjective,
uncertain and counterfactual information in texts, be it in scientific
papers, product reviews, or opinions in blogs. This type of +research is
concerned with processing texts at the information level and aims at
deep text understanding.  Modality and negation are phenomena relevant
for all applications that are concerned with +some form of text
understanding, including text mining, sentiment analysis, recognizing
textual entailment, information extraction, text summarization, and
question answering. Hence, the adequate +modeling of these phenomena is
of crucial importance to the natural language processing (NLP) community
as a whole.

Whereas from a theoretical perspective, the study of modality has a long
tradition, only in the recent years have these topics attracted the
attention of NLP researchers. Mainly, the development of +sentiment
analysis techniques and the growing need of mining biomedical texts have
been the causes for the interest in these semantic aspects of language.
In this talk I will define modality and +negation from an NLP
perspective, I will motivate the need for processing these phenomena,
and I will summarize existing research on processing modality and
negation, touching on diverse aspects +ranging from task modelling to
feature visualization. Finally, I will speculate about future
developments in this research area.
]]> https://www.ehu.eus/ehusfera/ixa/2011/02/15/roser-morantes-talk-modality-and-negation-in-natural-language-processing-20110223/feed/ 1
Wauter Bosma: Contextual salience in query-based summarization (2010/10/22) https://www.ehu.eus/ehusfera/ixa/2010/10/15/wauter-bosma-contextual-salience-in-query-based-summarization-20101022/ https://www.ehu.eus/ehusfera/ixa/2010/10/15/wauter-bosma-contextual-salience-in-query-based-summarization-20101022/#respond Fri, 15 Oct 2010 22:19:21 +0000 http://www.ehu.eus/ehusfera/ixa/2011/01/20/wauter-bosma-contextual-salience-in-query-based-summarization-20101022/ Speaker: Wauter Bosma (Vrieje Universiteit Amsterdam) Date: Oct 22, 2010 Time: 15:00 Where: Computer Science Faculty, room 2.2 .

Wauter Bosma is currently working as a postdoc on the European KYOTO project (where Ixa group is another partner) at the Vrieje Universiteit Amsterdam . His main research interests are in the area of Natural Language [...]]]> Speaker: Wauter Bosma (Vrieje Universiteit Amsterdam)
Date: Oct 22, 2010
Time: 15:00
Where: Computer Science Faculty, room 2.2 .

Wauter Bosma is currently working as a postdoc on the European KYOTO project (where Ixa group is another partner) at the Vrieje Universiteit Amsterdam . His main research interests are in the area of Natural Language Processing, and in particular text mining, terminology extraction and automatic summarization. In 2008 he received his PhD from the University of Twente on ‘Discourse-oriented summarization’.

Discourse theories claim that text gets meaning in context. Most summarization systems do not take advantage of this. They assess the relevance of each passage individually rather than modeling the way context affects the relevance of passages. In order to model relations in text, I developed a framework for graph-based summarization, so that the passages can be viewed in a broader context. The result is a summarization system which is more in line with discourse theory but still fully automatic. I evaluated the content selection performance of an implementation of the framework in different configurations. The system significantly outperforms a competitive baseline (and participant systems) on the DUC 2005 evaluation set.

]]> https://www.ehu.eus/ehusfera/ixa/2010/10/15/wauter-bosma-contextual-salience-in-query-based-summarization-20101022/feed/ 0