New to STS 2013, we provide STS common, a shared annotation and inference pipeline for STS. Strong open-source baselines like DKPro can be found in the STS wiki http://ixa2.si.ehu.es/stswiki/, a collaboratively maintained site, open to the STS community, with a comprehensive list of evaluation tasks and datasets, software and papers related to STS.
The data related for STS 2013 comprises the following:
STS Core task:
- Initial training data, covering the 2012 train and test data. This data covers 5 datasets: paraphrase sentence pairs (MSRpar), sentence pairs from video descriptions (MSRvid), MT evaluation sentence pairs (MTnews and MTeuroparl) and gloss pairs (OnWN). This data is now included in the trial data for the core STS task (see below).
- Trial data for the core STS task, including all the data from STS 2012 (additional details). Note that there is no new training data in 2013, but you can use the 2012 data.
- Test data with gold standard annotations. IMPORTANT NOTICE: Due to license restrictions, the SMT data needs to be downloaded from LDC, see http://catalog.ldc.upenn.edu/LDC2013T18.
- System submissions (you can also download 2012 system submissions).
STS Typed-similarity pilot task:
- Trial data for the typed-similarity pilot task (additional details).
- Training data for the typed-similarity pilot task, including tool for visualizing pairs of items (additional details).
- Test data with gold standard annotations.
- System submissions.