• Increase font size
  • Default font size
  • Decrease font size

The goal of the Semantic Textual Similarity (STS) task is to create a unified framework for the evaluation of semantic textual similarity modules and to characterize their impact on NLP applications. STS measures the degree of semantic equivalence. We are proposing the STS task as an attempt at creating a unified framework that allows for an extrinsic evaluation of multiple semantic components that otherwise have historically tended to be evaluated independently and without characterization of impact on NLP applications.

STS is related to both Textual Entailment (TE) and  Paraphrase, but differs in a number of ways and it is more directly applicable to a number of NLP tasks.  STS is  different from TE inasmuch as it assumes bidirectional graded equivalence between the pair of textual snippets. In the case of TE the equivalence is directional, e.g. a car is a vehicle, but a vehicle is not necessarily a car. STS also differs from both TE and Paraphrase in that, rather than being a binary yes/no decision (e.g. a vehicle is not a car), STS is a graded similarity notion (e.g. a vehicle and a car are more similar than a wave and a car). This graded bidirectional nature of STS is useful for NLP tasks such as MT evaluation, information extraction, question answering, and summarization.

Current textual similarity systems are limited in the scope of similarity they can address, mostly lexical and syntactic similarity. Some other linguistic phenomena have rarely been addressed in isolated efforts, e.g. metaphorical or idiomatic language [John spilled his guts to Mary, vs. John told Mary all about his stories/life], scoping and under-specification [Every representative of the company saw every sample], sentences where the structure is very divergent [The annihilation of Rome in 2000 BC was incurred by an insurgency of the slaves. Vs. The slaves' revolution 2 millennia before Christ destroyed the capital of the Roman Empire.], and various modality phenomena such as committed belief, permission or negation. The STS task would like to foster joint research efforts on these, to date,  fragmented areas.

On 2012 we held the first pilot task at SemEval 2012, as part of the *SEM 2012 conference, with great success: 35 teams and 88 runs (Agirre et al. 2012). In addition, we held a DARPA sponsored workshop at Columbia.

New on 2013:

  • STS has been selected as the official shared task of the *SEM 2013 conference. Registration now open! (see shared task tabs on the left)
  • STS common, an open source shared annotation and inference pipeline for STS (see data and open source pipeline tab on the left)
  • A comprehensive list of evaluation tasks and datasets, software (including strong open-source baselines such as DKPro) and papers related to STS can be found in, a collaboratively maintained site, open to the STS community

If interested in the task please join the mailing list for updates at


Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity. *SEM 2012: The First Joint Conference on Lexical and Computational Semantics, Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012). Montreal. 2012.



- *SEM program (incl. papers)
- *SEM registration open
- System runs available
- Gold standard data now available
- Results now available
- Train data for pilot on typed similarity available
- Trial data available
- STS selected as shared task of *SEM 2013
- Please join the mailing list for updates