Semantic Textual Similarity Wiki

Welcome to the Semantic Textual Similarity (STS) wiki page. Use this page to find and share STS resources. Please update and complete information at your will, you just need to create an account (see link above right). Please contact us if in doubt (aitor dot gonzalezagirre at gmail).

Refer to the datasets below for more information on specific STS and STS tasks, including the spin-off Interpretable STS task.

Join the low-traffic mail list for updates on STS.

STS benchmark

NEW STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

In order to provide a standard benchmark to compare among meaning representation systems in future years, we organized it into train, development and test, on three selected genres (news, captions, forums).

For comparable research starting 2017, we suggest using the STS benchmark.

More details here

Evaluation data

For comparable research starting 2017, we suggest using the STS benchmark.

You can find the evaluation data in the task websites, or you can download local copies from here:

STS 2017, SemEval task 1:
- Monolingual English, Arabic, Spanish, and cross-lingual English-Arabic, -Spanish and -Turkish: Test pairs and gold standard
STS 2016, SemEval task 1:
- English: Test
- Cross-lingual English-Spanish: Trial, Test
STS 2015, SemEval task 2:
- English: Trial, Test, Raw annotations and Perl Scripts
- Spanish: Test
STS 2014, SemEval task 10:
- English: Test, System submissions
- Spanish: Trial, Test
STS 2013:
- STS core task: Trial, Test, System submissions
- Typed similarity task: Trial, Train, Test, System submissions
STS 2012: Trial, Train, Test, System submissions

You can also find all the datasets at a single place plus various data loading and ingesting tools and baselines in GitHub repos:

Software and Resources

DKPro Similarity - Baseline STS system based on UKP Lab's top performing system from SemEval 2012 (Java).
TakeLab - TakeLab's top performing system from Semeval 2012 (Python).
DLS@CU - The aligner used in the top performing system from SemEval 2014 (Python)

Evaluation tasks

NEW STS 2017, SemEval task 1 (English and Crosslingual)
STS 2016, SemEval task 1 (English and new Crosslingual subtask)
STS 2015, SemEval task 2 (English, Spanish and new Interpretability subtask)
STS 2014, SemEval task 10 (English and Spanish)
STS 2013, the *SEM shared task:
- STS core task
- Typed similarity task
STS 2012, a SemEval task

Other STS datasets have also been produced by third parties:

SemEval 2015 task 1 includes STS for twitter pairs [1]
SemEval 2014 task 1 includes STS data [2]
SemEval 2014 task 3 includes similarity between different levels [3]

Interpretable STS tasks (see below for more details):

Interpretable STS 2016, SemEval task 2
STS 2015, SemEval task 2 (English, Spanish and new Interpretability subtask)

Interpretable STS

Given two sentences of text, s1 and s2, the systems participating in STS compute how similar s1 and s2 are, returning a similarity score. Although the score is useful for many tasks, it does not allow to know which parts of the sentences are equivalent in meaning (or very close in meaning) and which not. The aim of interpretable STS is to explore whether systems are able to explain WHY they think the two sentences are related / unrelated, adding an explanatory layer to the similarity score.

The explanatory layer consists of an alignment of chunks across the two sentences, where alignments are annotated with a similarity score and a relation label.

Tasks and data:

A full task in Interpretable STS 2016, SemEval task 2: Train1, Train2, Test
A pilot subtask in STS 2015, SemEval task 2: Train, Test

Note that the 2016 train includes a re-annotated version of the 2015 pairs (Train1) plus new sentence pairs from student assesment (Train2). The test sentence pairs are all new.

Papers

STS Task Overviews
- 2017 SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation Daniel Cer; Mona Diab; Eneko Agirre; Inigo Lopez-Gazpio; Lucia Specia. Proceedings of SemEval 2017.
- 2016 Semeval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation Eneko Agirre; Carmen Banea; Daniel Cer; Mona Diab; Aitor Gonzalez-Agirre; Rada Mihalcea; German Rigau; Janyce Wiebe. Proceedings of SemEval 2016.
- 2015 SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, Eneko Agirre; Carmen Banea; Claire Cardie; Daniel Cer; Mona Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Inigo Lopez-Gazpio; Montse Maritxalar; Rada Mihalcea; German Rigau; Larraitz Uria; Janyce Wiebe. Proceedings of SemEval 2015.
- 2014 SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, Eneko Agirre; Carmen Banea; Claire Cardie; Daniel Cer; Mona Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Rada Mihalcea; German Rigau; Janyce Wiebe. Proceedings of SemEval 2014.
- 2013 *SEM 2013 shared task: Semantic Textual Similarity, Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, WeiWei Guo *SEM 2013
- 2012 SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Semeval 2012

Interpretable STS Task Overviews
- 2016 Interpretable Semantic Textual Similarity: Finding and explaining differences between sentences Preprint on arXiv Final version on KNOSYS, Inigo Lopez-Gazpio; Montse Maritxalar; Aitor Gonzalez-Agirre; German Rigau; Larraitz Uria; Eneko Agirre. Knowledge-Based Systems. ISSN: 0950-7051. DOI: http://dx.doi.org/10.1016/j.knosys.2016.12.013
- 2016 Semeval-2016 task 2: Interpretable semantic textual similarity, Eneko Agirre; Aitor Gonzalez-Agirre; Inigo Lopez-Gazpio; Montse Maritxalar; German Rigau; Larraitz Uria. Proceedings of SemEval 2016.
- 2015 SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, Eneko Agirre; Carmen Banea; Claire Cardie; Daniel Cer; Mona Diab; Aitor Gonzalez-Agirre; Weiwei Guo; Inigo Lopez-Gazpio; Montse Maritxalar; Rada Mihalcea; German Rigau; Larraitz Uria; Janyce Wiebe. Proceedings of SemEval 2015.

Selected System Papers
- 2015
  - English Task: DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition, Md Arafat Sultan; Steven Bethard; Tamara Sumner, *SEM 2015
  - Interpretable subtask: NeRoSim: A System for Measuring and Interpreting Semantic Textual Similarity, Rajendra Banjade; Nobal B. Niraula; Nabin Maharjan; Vasile Rus, Dan Stefanescu, Mihai Lintean; Dipesh Gautam, *SEM 2015
- 2014
  - English Task: DLS$@$CU: Sentence Similarity from Word Alignment, Md Arafat Sultan; Steven Bethard; Tamara Sumner, *SEM 2014
- 2013
  - Core Task: UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems, Lushan Han, Abhay L. Kashyap, Tim Finin, James Mayfield, Jonathan Weese, *SEM 2013
  - Typed-similarity Task: UNITOR-CORE_TYPED: Combining Text Similarity and Semantic Filters through SV Regression, Danilo Croce, Valerio Storch, Roberto Basili, *SEM 2013
- 2012
  - TakeLab: Systems for Measuring Semantic Text Similarity, Frane Šarić, Goran Glavaš, Mladen Karan, Jan Šnajder and Bojana Dalbelo Bašić, Semeval 2012
  - UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures, Daniel Bär, Chris Biemann, Iryna Gurevych, and Torsten Zesch, Semeval 2012

Proceedings
- Proceedings of STS 2016: check relevant papers of SemEval 2016 proceedings
- Proceedings of STS 2015: check relevant papers of SemEval 2015 proceedings
- Proceedings of STS 2014: check relevant papers of SemEval 2014 proceedings
- Proceedings of STS 2013: check relevant papers of *SEM 2013 Shared Task proceedings
- Proceedings of STS 2012: check relevant papers of SemEval 2012 proceedings

Sample of papers where STS is used for evaluation

Sanjeev Arora, Yingyu Liang, Tengyu Ma. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. ICLR 2017.

Jiaqi Mu, Suma Bhat, Pramod Viswanath. All-but-the-Top: Simple and Effective Postprocessing for Word Representations. arXiv:1702.01417. 2017

John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu. Towards Universal Paraphrastic Sentence Embeddings. ICLR 2016

Tamara Polajnar, Laura Rimell, and Stephen Clark. Evaluation of Simple Distributional Compositional Operations on Longer Texts. LREC. 2014

Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image description to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, Vol. 2, 2014

Islam Beltagy, Katrin Erk and Raymond Mooney. Semantic Parsing using Distributional Semantics and Probabilistic Logic. Extended abstract. Proceedings of ACL 2014 Workshop on Semantic Parsing (SP-2014). 2014.

Islam Beltagy, Katrin Erk and Raymond Mooney. Probabilistic Soft Logic for Semantic Textual Similarity. Proceedings of ACL 2014.

Islam Beltagy, Chau Kim Cuong, Gemma Boleda, Dan Garrette, Katrin Erk, and Raymond Mooney. Montague meets Markov: Deep semantics with probabilistic logical form. Proceedings of *SEM. 2013.

Eduardo Blanco and Dan I. Moldovan. A Logic Prover Approach to Predicting Textual Similarity. FLAIRS Conference. 2013.

Eduardo Blanco and Dan I. Moldovan. A Semantically Enhanced Approach to Determine Textual Similarity. EMNLP. 2013.

Weiwei Guo; Mona Diab. Improving Lexical Semantics for Sentential Semantics: Modeling Selectional Preference and Similar Words in a Latent Variable Model. Proceedings of NAACL. 2013.

Mohammad Taher Pilehvar; David Jurgens; Roberto Navigli. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity". Proceedings of ACL. 2013.

Croce, D.; Storch, V.; Annesi, P.; Basili, R. Distributional Compositional Semantics and Text Similarity. Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on , vol., no., pp.242,249, 19-21 Sept. 2012.

Contact

Eneko Agirre.

About this page: The information and links presented here are jointly maintained by research sites involved.

Main Page

Semantic Textual Similarity Wiki

Contents

STS benchmark

Evaluation data

Software and Resources

Evaluation tasks

Interpretable STS

Papers

Sample of papers where STS is used for evaluation

Contact

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools