Main Page

From stswiki
Jump to: navigation, search

Semantic Textual Similarity Wiki

Welcome to the Semantic Textual Similarity (STS) wiki page. Use this page to find and share STS resources. Please update and complete information at your will, you just need to create an account (see link above right). Please contact us if in doubt (aitor dot gonzalezagirre at gmail).

Refer to the datasets below for more information on specific STS and STS tasks, including the spin-off Interpretable STS task.

Join the low-traffic mail list for updates on STS.

STS benchmark

NEW STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

In order to provide a standard benchmark to compare among meaning representation systems in future years, we organized it into train, development and test, on three selected genres (news, captions, forums).

For comparable research starting 2017, we suggest using the STS benchmark.

More details here

Evaluation data

For comparable research starting 2017, we suggest using the STS benchmark.

You can find the evaluation data in the task websites, or you can download local copies from here:

You can also find all the datasets at a single place plus various data loading and ingesting tools and baselines in GitHub repos:

Software and Resources

  • DKPro Similarity - Baseline STS system based on UKP Lab's top performing system from SemEval 2012 (Java).
  • TakeLab - TakeLab's top performing system from Semeval 2012 (Python).
  • DLS@CU - The aligner used in the top performing system from SemEval 2014 (Python)

Evaluation tasks

Other STS datasets have also been produced by third parties:

  • SemEval 2015 task 1 includes STS for twitter pairs [1]
  • SemEval 2014 task 1 includes STS data [2]
  • SemEval 2014 task 3 includes similarity between different levels [3]

Interpretable STS tasks (see below for more details):

Interpretable STS

Given two sentences of text, s1 and s2, the systems participating in STS compute how similar s1 and s2 are, returning a similarity score. Although the score is useful for many tasks, it does not allow to know which parts of the sentences are equivalent in meaning (or very close in meaning) and which not. The aim of interpretable STS is to explore whether systems are able to explain WHY they think the two sentences are related / unrelated, adding an explanatory layer to the similarity score.

The explanatory layer consists of an alignment of chunks across the two sentences, where alignments are annotated with a similarity score and a relation label.

Tasks and data:

Note that the 2016 train includes a re-annotated version of the 2015 pairs (Train1) plus new sentence pairs from student assesment (Train2). The test sentence pairs are all new.


Sample of papers where STS is used for evaluation

  • Sanjeev Arora, Yingyu Liang, Tengyu Ma. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. ICLR 2017.
  • Jiaqi Mu, Suma Bhat, Pramod Viswanath. All-but-the-Top: Simple and Effective Postprocessing for Word Representations. arXiv:1702.01417. 2017
  • John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu. Towards Universal Paraphrastic Sentence Embeddings. ICLR 2016
  • Tamara Polajnar, Laura Rimell, and Stephen Clark. Evaluation of Simple Distributional Compositional Operations on Longer Texts. LREC. 2014
  • Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. From image description to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, Vol. 2, 2014
  • Islam Beltagy, Katrin Erk and Raymond Mooney. Semantic Parsing using Distributional Semantics and Probabilistic Logic. Extended abstract. Proceedings of ACL 2014 Workshop on Semantic Parsing (SP-2014). 2014.
  • Islam Beltagy, Katrin Erk and Raymond Mooney. Probabilistic Soft Logic for Semantic Textual Similarity. Proceedings of ACL 2014.
  • Islam Beltagy, Chau Kim Cuong, Gemma Boleda, Dan Garrette, Katrin Erk, and Raymond Mooney. Montague meets Markov: Deep semantics with probabilistic logical form. Proceedings of *SEM. 2013.
  • Eduardo Blanco and Dan I. Moldovan. A Logic Prover Approach to Predicting Textual Similarity. FLAIRS Conference. 2013.
  • Eduardo Blanco and Dan I. Moldovan. A Semantically Enhanced Approach to Determine Textual Similarity. EMNLP. 2013.
  • Weiwei Guo; Mona Diab. Improving Lexical Semantics for Sentential Semantics: Modeling Selectional Preference and Similar Words in a Latent Variable Model. Proceedings of NAACL. 2013.
  • Mohammad Taher Pilehvar; David Jurgens; Roberto Navigli. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity". Proceedings of ACL. 2013.
  • Croce, D.; Storch, V.; Annesi, P.; Basili, R. Distributional Compositional Semantics and Text Similarity. Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on , vol., no., pp.242,249, 19-21 Sept. 2012.


Eneko Agirre.

About this page: The information and links presented here are jointly maintained by research sites involved.