Evaluation – Ixa Group. Language Technology.

Talk: Discourse Structure in Machine Translation Evaluation (L. Marquez, 2015/06/25)

Kepa Sarasola — Tue, 23 Jun 2015 14:19:03 +0000

Speaker: Lluis Màrquez
…………….Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI)
Data: June 25th 2015, Thursday
Time: 12:00
Room: 3.2 room. Faculty of Informatics (UPV/EHU)
Title: “Discourse Structure in Machine Translation Evaluation”

Abstract:

In this talk I will describe our research at the Arabic Language Technologies group from the Qatar Computing Research Institute on applying discourse-level information to automatic machine translation (MT) evaluation.
I will start by describing some variants of a discourse-aware similarity measure, which uses the `all-subtree’ convolution kernel to compare discourse parse trees in accordance with the Rhetorical Structure Theory. Then, I will show that these measures help improve a number of already existing MT evaluation metrics both at the segment and at the system level by increasing the correlation with human judgements. This indicates that discourse information is complementary to the state-of-the-art metrics, and thus could be taken into account in the development of richer evaluation measures.
In a second part I will present a strong and robust evaluation measure combining the discourse-based similarity with other metrics from the Asiya MT evaluation toolkit, and tuning the weights of the combination on actual human judgments. Experiments on the WMT12, WMT13, and WMT14 metrics shared task datasets show correlation with human judgments that outperforms those of the state-of-the-art, both at the segment and at the system level with very consistent results across language pairs.
In the final part of the talk, I will introduce two preliminary attempts of learning metrics from finer-grained features for pairwise quality comparison. In the first one, we use preference reranking with kernels to learn from tree structured representation. In the second one, we use a Neural Network architecture to learn from a distributed representation of syntax and semantics. Both frameworks are developed with the spirit of being general and extensible from MT evaluation to quality estimation and machine translation.

Short bio:

Principal Scientist at the Arabic Language Technologies group from the Qatar Computing Research Institute (QCRI) since 2013. Previously, Associate Professor at the Technical University of Catalonia (UPC, 2000-2013). He holds a PhD. in Computer Science from UPC (1999). His research focuses on Machine Learning methods for Natural Language structure prediction problems, including syntactic and semantic parsing. He works on applications in statistical machine translation and its evaluation, and question answering in community forums. He has 120+ papers in Natural Language Processing and Machine Learning journals and conferences. He has been General and Program Co-chair of major conferences in the area (EMNLP, EACL, CoNLL, EAMT, etc.), and held several organizational roles in ACL and EMNLP too. He was co-organizer of various international evaluation tasks at Senseval/SemEval (2004, 2007, 2010, 2015) and CoNLL shared tasks (2004-2005, 2008-2009). Secretary and President of the ACL SIG on Natural Language Learning (SIGNLL) in the period 2007-2011, he currently serves as President of the European Chapter of the ACL (2015-2017). He has been Guest Editor of special issues at Computational Linguistics, LRE, JNLE, and JAIR in the period (2007-2015). He has participated in 18 national and EU research projects, acting as the principal site researcher in 10 of them.

Talk. Daniele Pighin. Semantic Structures in Translation Ranking (2011/05/31)

Kepa Sarasola — Fri, 27 May 2011 11:22:05 +0000

Speaker: Daniele Pighin
          NLPRG, TALP
          Technical University of Catalonia, UPC
Date: May 31, 2011
Time: 11:30
Where: Computer Science Faculty, Room 3.2

Title
   Automatic Projection of Semantic Structures:
      an Application to Pairwise Translation Ranking
 
Abstract
The ability to automatically assess the quality of translation
hypotheses is a key requirement towards the development of accurate and
dependable translation models. While it is largely agreed that proper
transfer of predicate-argument structures from source to target is a
very strong indicator of translation quality, especially in relation to
adequacy, the incorporation of this kind of information in the
Statistical Machine Translation (SMT) evaluation pipeline is still
limited to few and isolated cases.

We present a model for the inclusion of semantic role annotations in the
framework of confidence estimation for machine translation. The model
has several interesting properties:
   1) it only requires a linguistic processor on the (generally
well-formed) source side of the translation;
   2) it does not directly rely on properties of the translation model
(hence, it can be applied beyond phrase-based systems);
   3) it is inherently extendable to cope with different kinds of
sequential annotations, e.g., POS tags.
These features make it potentially appealing for system ranking,
translation re-ranking and user feedback evaluation. Preliminary
experiments in pairwise hypothesis ranking on five confidence estimation
benchmarks show that the model has the potential to capture salient
aspects of translation quality.

Talk. Lluís Màrquez. Automatic evaluation in Machine Translation: Towards combined linguistically-motivated measures (2011/05/10)

Kepa Sarasola — Mon, 09 May 2011 10:10:00 +0000

Speaker: Lluís Màrquez
NLPRG, TALP
Technical University of Catalonia, UPC

Date: May 10, 2011
Time: 15:30
Where: Computer Science Faculty, Room 3.2

Automatic evaluation in Machine Translation:
Towards combined linguistically-motivated measures

Automatic evaluation plays a very important role in the development and comparison of machine translation systems. In this talk we will overview the current trend of using linguistically-guided evaluation measures based on several linguistic layers and their combination. Also, we will talk about confidence estimation measures, a particular subset of measures to assess output quality without the need of reference translations. Finally, we will overview the role of evaluation measures within the FAUST European project (Feedback Analysis for User Adaptive Statistical Translation; http://www.faust-fp7.eu/),
focusing on the usage of user feedback to guide the combination of measures.