SemEval-2007

Task #1: Evaluating WSD on Cross-Language Information Retrieval

A joint Semeval-2007 / CLEF (Cross-Language Evaluation Forum) task

Organized by:

Eneko Agirre (University of the Basque Country)
Bernardo Magnini (ITC/IRST)
German Rigau (University of the Basque Country)
Piek Vossen (Irion BV)

CLEF 2008 organizes a Robust WSD CLIR task and a related WSD QA task.

Paper describing task available from SemEval proceedings
W07-2001 [bib]: Eneko Agirre; Bernardo Magnini; Oier Lopez de Lacalle; Arantxa Otegi; German Rigau; Piek Vossen
SemEval-2007 Task 01: Evaluating WSD on Cross-Language Information Retrieval

Results available (see below)
Release of trial data (Semeval website)
Release of word-by-word features extracted from the trial data documents and topics, as well as all Semcor 1.6 (see below).
Release of scripts to build output files from word-by-word results (see below)
End User Agreement with CLEF available (word, pdf). Please return it signed by fax and mail to the following prior to registering:

Fax number: (to the attention of Eneko Agirre) +34 943 015 590

Adress:

Eneko Agirre
Informatika Fakultatea
Manuel Lardizabal pasealekua, 1
20.018 - Donostia
Basque Country (Spain)

Mailing list

Instructions for participation

Please note the following steps in order to participate:

download trial data
fax and mail a signed-in copy of the end user agreement (word, pdf) (please add a contact e-mail)
the organizers will send a user/pwd by e-mail
register your team in the Semeval website
click on the download button of the registration website (from Feb. 26 onwards, this is when your 14 days period begins)
follow the instructions in the downloaded file, and download the actual data files (you will need the user/pwd)
once you are done, upload the output files in the Semeval website (prior to Mar. 26 and not later than 14 days after performing step 5)

In addition to the general participation guidelines note also the following:

by participating in Semeval-2007 you grant permission for future CLEF-2008 participants to use your automatically annotated data for research purposes.
the results returned by participants need to conform to the dtd's as provided by the organizers. Otherwise we cannot guarantee to be able to score those results. Software to validate the results is provided with the trial and test data.
given the overload of expanding and scoring the systems, the dealine for uploading results is tighter than other Semeval tasks (26 of March).
given the amount of text to be tagged, participants have 2 weeks to submit results starting from test data download time.

General design

The participants will be provided with (see Full Description for more details):

   1. the document collections (.nam+id files)
   2. the topics (.nam+id files)

The participants need to return a single compressed file with the input files enriched with WordNet 1.6 sense tags:

   1. for all the documents in the collection (.wsd files)
   2. for all the topics (.wsd files)

All files are in XML. Input documents and topics (.nam+id files) will follow the "docs.nam+id.dtd" and "topics.nam+id.dtd" dtds respectively. Output documents and topics (.wsd files) will follow the "docs.wsd.dtd" and "topics.wsd.dtd" dtds respectively.

The result files will be organized into a rich directory structure which follows the directory structure of the input.

See the trial data release for further details and examples of files.

Note that all senses returned by participants will be expanded, regardless of their weight. The current CLIR system does not use the weight information.

Evaluation

The organizers will run an internal IR and CLIR system based on (Vossen et al. 2006) on each of the participants results as follows:

   1. expand the returned sense tags to all synonyms in the target
      languages (Spanish and English) for both the documents and
      queries
   2. index both original and expanded documents
   3. perform the queries in two fashions:
      a. original queries on expanded document collection
      b. expanded queries on original document collection
   4. compare the returned documents with relevance judgements

The participant systems will be scored according to standard IR/CLIR measures as implemented in the TREC evaluation package.

In addition, we will use some of the system outputs to compute WSD precision/recall. This will be done for analysis purposes only, as the official evaluation is that of the CLIR results.

Schedule

Jan. 3    Trial data available
Jan. 17    End user agreement (EUA) available in website
Feb. 26    Test data available for download (signed EUA required)
Mar. 26    Deadline for uploading results

June SemEval Workshop at ACL

September CLEF conference

Download area

Trial data is already available at the main Semeval website.

The software to validate input and output files is available with the trial data.
We also provide some of the widely used WSD features in a word-to-word fashion (Agirre et al. 2006) in order to make participation easier. These features will be available for both topics and documents (test data) as well as all the words with frequency above 10 in SemCor 1.6 (which can be taken as the training data for supervised WSD systems).

WSD training features extracted from Semcor 1.6 (version 2 of data, readme)
WSD features extracted from:

trial documents and topics (data, readme)
test documents and topics (at test time)

In addition, we provide two scripts which take the word-by-word WSD result and build the output files:

Perl script for documents wsdOut
Perl script for topics wsdOut-topics

System and Results

These are the anonimous results. Organizers submitted a system out of competition.

Semeval 2007: task #1 results
=============================


Anonymized results: PART-A and PART-B for each participant respectively.


NOTE:

- PART-B returned WordNet 2.1 senses, so we had to map automatically to
  1.6 senses.

- In order to evaluate on Senseval 2 and 3 all-words, we had to map
  automatically all WSD results to the respective WordNet version.


===============
IR RESULTS
===============

* All results are MAP

A) No expansion whatsoever

no expansion     0.3599


B) Expansion of topics (original documents)

full expansion   0.1610
1st sense        0.2862

ORGANIZERS       0.2886
PART-A           0.3030
PART-B           0.3036


C) Expansion of documents (original topics)

full expansion   0.1410
1st sense        0.1172

ORGANIZERS       0.1587
PART-A           0.1521
PART-B           0.1482


============
CLIR RESULTS
============

D) Translation (Expansion) of English documents (Spanish topics)

no expansion     0.1446
full expansion   0.2676
1st sense        0.2637

ORGANIZERS       0.2664
PART-A           0.1373
PART-B           0.1734



============
WSD RESULTS
============

E) SENSEVAL 2 ALL-WORDS

                precision recall  attempted
ORGANIZERS      0.584     0.577   93.61%
PART-A          0.498     0.375   75.39%
PART-B          0.388     0.240   61.92%


F) SENSEVAL 3 ALL-WORDS

                precision recall  attempted
ORGANIZERS      0.591     0.566   95.76%
PART-A          0.484     0.338   69.98%
PART-B          0.334     0.186   55.68%

Contact

Eneko Agirre

References

Agirre E., O. Lopez de Lacalle Lekuona , D. Martinez.
"Exploring feature set combinations for WSD".
In procceedings of the annual meeting of the SEPLN, Spain. 2006

Vossen P., Rigau G., Alegria I., Agirre E., Farwell D., Fuentes M.
Meaningful results for Information Retrieval in the MEANING project
Proceedings of Third International WordNet Conference. Jeju Island (Korea). 2006

For more information, visit the SemEval-2007 home page and the CLEF website.