|
Task #1: Evaluating WSD on Cross-Language Information
Retrieval
A joint Semeval-2007 / CLEF (Cross-Language Evaluation
Forum) task
Eneko Agirre
(University of the Basque Country)
Bernardo Magnini (ITC/IRST)
German Rigau (University of the Basque Country)
Piek Vossen (Irion BV)
- Results available (see below)
- Release of trial data (Semeval website)
- Release of word-by-word features extracted from the trial data documents and topics, as well as all Semcor 1.6 (see below).
- Release of scripts to build output files from word-by-word results (see below)
- End User Agreement with CLEF available (word, pdf). Please return it signed by fax and mail to the following prior to registering:
Fax number: (to the attention of Eneko Agirre) +34 943 015 590
Adress:
Eneko Agirre
Informatika Fakultatea
Manuel Lardizabal pasealekua, 1
20.018 - Donostia
Basque Country (Spain)
Mailing list
Instructions for participation
Please note the following steps in order to participate:
- download trial data
- fax and mail a signed-in copy of the
end user agreement (word, pdf) (please add a contact e-mail)
- the organizers will send a user/pwd by e-mail
- register your team in the Semeval website
- click on the download button of the registration website (from Feb. 26 onwards, this is when your 14 days period begins)
- follow the instructions in the downloaded file, and download the actual data files (you will need the user/pwd)
- once you are done, upload the output files in the Semeval website (prior to Mar. 26 and not later than 14 days after performing step 5)
In addition to the general participation guidelines note also the following:
-
by participating in Semeval-2007 you grant permission for future CLEF-2008 participants to use your
automatically annotated data for research purposes.
-
the results returned by participants need to
conform to the dtd's as provided by the organizers. Otherwise we cannot
guarantee to be able to score those results. Software to validate the
results is provided with the trial and test data.
-
given the overload of expanding and scoring
the systems, the dealine for uploading results is tighter than other
Semeval tasks (26 of March).
-
given the amount of text to be tagged, participants
have 2 weeks
to submit results starting from test data
download time.
General design
The participants will be provided with (see Full
Description for more details):
1. the document collections (.nam+id files)
2. the topics (.nam+id files)
The participants need to return a single compressed file with the input
files enriched with WordNet
1.6 sense tags:
1. for all the documents in the collection (.wsd
files)
2. for all the topics (.wsd files)
All files are in XML. Input documents and topics (.nam+id files) will
follow the "docs.nam+id.dtd" and "topics.nam+id.dtd" dtds
respectively. Output documents and topics (.wsd files) will follow the
"docs.wsd.dtd" and "topics.wsd.dtd" dtds respectively.
The result files will be organized into a rich directory structure
which follows the directory structure of the input.
See the trial data release for further details and examples of files.
Note that
all
senses returned by participants will be expanded, regardless of their
weight. The current CLIR system does not use the weight information.
Evaluation
The organizers will run an internal IR and CLIR system
based on (Vossen et al. 2006) on each of the participants results as
follows:
1. expand the returned sense tags to all synonyms
in the target
languages (Spanish
and English) for both the documents and
queries
2. index both original and expanded documents
3. perform the queries in two fashions:
a. original queries
on expanded document collection
b. expanded queries
on original document collection
4. compare the returned documents with relevance
judgements
The participant systems will be scored according to standard IR/CLIR
measures as implemented in the TREC evaluation
package.
In addition, we will use some of the system outputs to compute WSD
precision/recall. This will be done for analysis purposes only, as the
official evaluation is that of the CLIR results.
Schedule
Jan. 3
Trial data available
Jan. 17 End user agreement (EUA) available in
website
Feb.
26 Test data available for download (signed EUA required)
Mar. 26 Deadline for uploading results
June SemEval Workshop at ACL
September CLEF conference
Download area
Trial data is already available at the main Semeval
website.
The software to validate input and output files is available with the trial data.
We also provide some of the widely used WSD
features in a
word-to-word fashion (Agirre et al. 2006) in order to make
participation easier. These features will be available for both topics
and documents (test data) as well as all the words with frequency above
10 in SemCor 1.6 (which can be taken as the training data for
supervised WSD systems).
- WSD training features extracted from Semcor
1.6 (version 2 of data, readme)
- WSD features extracted from:
- trial documents and topics (data, readme)
- test documents and topics (at test time)
In addition, we provide two scripts which take the word-by-word WSD result and build the output files:
System and Results
These are the anonimous results. Organizers submitted a system out of competition.
Semeval 2007: task #1 results =============================
Anonymized results: PART-A and PART-B for each participant respectively.
NOTE:
- PART-B returned WordNet 2.1 senses, so we had to map automatically to 1.6 senses.
- In order to evaluate on Senseval 2 and 3 all-words, we had to map automatically all WSD results to the respective WordNet version.
=============== IR RESULTS ===============
* All results are MAP
A) No expansion whatsoever
no expansion 0.3599
B) Expansion of topics (original documents)
full expansion 0.1610 1st sense 0.2862
ORGANIZERS 0.2886 PART-A 0.3030 PART-B 0.3036
C) Expansion of documents (original topics)
full expansion 0.1410 1st sense 0.1172
ORGANIZERS 0.1587 PART-A 0.1521 PART-B 0.1482
============ CLIR RESULTS ============
D) Translation (Expansion) of English documents (Spanish topics)
no expansion 0.1446 full expansion 0.2676 1st sense 0.2637
ORGANIZERS 0.2664 PART-A 0.1373 PART-B 0.1734
============ WSD RESULTS ============
E) SENSEVAL 2 ALL-WORDS
precision recall attempted ORGANIZERS 0.584 0.577 93.61% PART-A 0.498 0.375 75.39% PART-B 0.388 0.240 61.92%
F) SENSEVAL 3 ALL-WORDS
precision recall attempted ORGANIZERS 0.591 0.566 95.76% PART-A 0.484 0.338 69.98% PART-B 0.334 0.186 55.68%
Contact
Eneko Agirre
References
Agirre E., O. Lopez de Lacalle Lekuona , D. Martinez.
"Exploring
feature set combinations for WSD".
In procceedings of the annual
meeting of the SEPLN, Spain. 2006
Vossen P., Rigau G., Alegria I., Agirre E., Farwell D., Fuentes
M.
Meaningful results for Information Retrieval in the MEANING project
Proceedings of Third International WordNet Conference. Jeju Island
(Korea). 2006 |