Guidelines
Also see: http://ixa2.si.ehu.es/clirwsd/
Guidelines for Participation in the CLEF 2009 Ad-Hoc Track: Robust WSD Task
In
these Guidelines, we provide information on the test collections, the
tasks, data manipulation, query construction and results submission
for the Robust WSD task of the CLEF 2009 Ad-Hoc track.
Guidelines for the other CLEF tracks can be found on the dedicated
webpages for these tracks.
MAIN TEST COLLECTION
In CLEF 2009 the Robust WSD monolingual and Robust WSD bilingual tasks use only the English (LA Times '94 and Glasgow Herald '95) collections.
Topics are released using the correct diacritics (according to the language) but may contain occasional spelling errors/inconsistencies, minor formatting deficiencies. We aim to keep these at a minimum.
Ad-hoc collections which were available at CLEF 2001
-
LA Times 94 (with WSD data)
-
Glasgow Herald 95 (with WSD data)
TASKS
-
monolingual IR (English)
-
bilingual (Spanish -> English)
Much of the evaluation methodology adopted for these 2 tasks in CLEF is an adaptation of the strategy studied for the TREC ad-hoc task. The instructions given below have been derived from those distributed by TREC. We hope that they are clear and comprehensive. However, please do not hesitate to ask for clarifications or further information if you need it. Send queries to the organizers.
TOPICS
The test and train topics are distributed as follows:
- CLEF
years 2001-2002,2004: for Training
- CLEF years 2003, 2005-2006: for Testing
- TREC year 2004: for Training and Testing
The CLEF topics are available in Spanish and English.
The TREC test topics (301-350) are available in English and Spanish, but the TREC train topics (601-700) will be only available in English. Note that we deleted the TREC topics that had no relevant document in the LA Times 94 collection, and thus have 143 topics in the301-350 range (test) and 84 topics in the 601-700 range (train).
The following table summarizes the test/train topics and corresponding target collections are the following:
|
Year |
Topics |
English Documents |
|
Train |
CLEF 2001 |
41-90 |
LA Times 94 |
x |
Train |
CLEF 2002 |
91-140 |
LA Times 94 |
x |
Train |
CLEF 2004 |
201-250 |
x |
Glasgow Herald 95 |
Train |
TREC 2004 |
601-700 |
LA Times 94 |
x |
Test |
CLEF 2003 |
141-200 |
LA Times 94 |
Glasgow Herald 95 |
Test |
CLEF 2005 |
251-300 |
LA Times 94 |
Glasgow Herald 95 |
Test |
CLEF 2006 |
301-350 |
LA Times 94 |
Glasgow Herald 95 |
Test |
TREC 2004 |
301-450 |
LA Times 94 |
x |
Test and Training Data for Robust 2009
CONSTRUCTING
AND MANIPULATING THE SYSTEM DATA STRUCTURES FOR AD-HOC TRACKS
The
system data structures are defined to consist of the original
documents, any new structures built automatically from the documents
(such as inverted files, thesauri, conceptual networks, etc.), and
any new structures built manually from the documents (such as
thesauri, synonym lists, knowledge bases, rules, etc.).
1.
The system data structures may not be modified in response to the
test topics.
For example, you cannot add topic words that are not in your
dictionary. The CLEF tasks represent the real-world problem of an
ordinary user posing a question to a system. In the case of the
cross-language tasks, the question is posed in one language and
relevant documents must be retrieved whatever the language in which
they have been written. If an ordinary user could not make the change
to the system, you should not make it after receiving the topics.
2. There are several parts of the CLEF data collections that
contain manually-assigned, controlled or uncontrolled index terms.
These fields are delimited by SGML
(XML-compatible)
tags. Since the primary focus of CLEF is on retrieval of naturally
occurring text over language boundaries, these manually-indexed terms
should not be indiscriminately used as if they are a normal part of
the text.
3. Only the following fields may be used for automatic retrieval:
LA TIMES
1994: HEADLINE, TEXT
only
Glasgow Herald:
HEADLINE, TEXT only
Learning from (e.g. building translation sources from) such fields is permissible.
GUIDELINES
FOR CONSTRUCTING THE QUERIES
The
queries are constructed from the topics. Each topic consists of three
fields: a brief title statement; a one-sentence description; a more
complex narrative specifying the relevance assessment criteria.
Queries can consist of 1 or more of these fields.
There are
many possible methods for converting the supplied topics into queries
that your system can execute. We have broadly defined two generic
methods, "automatic" and "manual", based on
whether manual intervention is used or not. When more than one set of
results are submitted, the different sets may correspond to different
query construction methods, or if desired, can be variants within the
same method. Only
automatic runs are allowed in this task.
The
manual query construction method includes BOTH runs in which the
queries are constructed manually and then run without looking at the
results AND runs in which the results are used to alter the queries
using some manual operation. The distinction is being made here
between runs in which there is no human involvement (automatic query
construction) and runs in which there is some type of human
involvement (manual query construction). It is clear that manual runs
should be appropriately motivated in a CLIR context, e.g. a run where
a proficient human simply translates the topic into the document
language(s) is not what most people think of as cross-language
retrieval.
To further clarify this, here are some example
query construction methodologies, and their correct query
construction classification. Note that these are only examples; many
other methods may be used for automatic or manual query construction.
1. queries constructed automatically from the topics, the
retrieval results of these queries sent to the CLEF results server
--> automatic query construction
2. queries constructed
automatically from the topics, then expanded by a method that takes
terms automatically from the top 30 documents (no human involved) -->
automatic query construction
3. queries constructed manually from
the topics, results of these queries sent to the CLEF results server
--> manual query construction
4. queries constructed
automatically from the topics, then modified by human selection of
terms suggested from the top 30 documents --> manual query
construction
Note that by including all types of
human-involved runs in the manual query construction method we make
it harder to do comparisons of work within this query construction
method. We thus only
allow automatic runs.
Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD in various ways. Only Title and Description of topics can be used to construct the queries.
WHAT TO DO WITH YOUR RESULTS
Your
results must be sent to the CLEF results server (address to be
communicated), respecting the submission deadlines (see
below).
Results have to be submitted in ASCII format, with one
line per document retrieved.
The lines have to be formatted as
follows:
10.2452/451-AH |
Q0 |
document.00072 |
0 |
0.017416 |
runidex1 |
1 |
2 |
3 |
4 |
5 |
6 |
The fields must be separated by ONE blank and have the following meanings:
1) Query
identifier. Please use the complete DOI identifier of the topic (e.g.
10.2452/451-AH, not only 451)
INPUT MUST BE SORTED NUMERICALLY BY
QUERY NUMBER.
2) Query iteration (will be ignored. Please choose "Q0" for all experiments).
3) Document number (content of the <DOCNO> tag.).
4) Rank
0..n (0 is best matching document. If you retrieve 1000 documents per
query, rank will be 0..999, with 0 best and 999 worst). Note that
rank starts at 0 (zero) and not 1 (one).
MUST BE SORTED IN
INCREASING ORDER PER QUERY.
5) RSV
value (system specific value that expresses how relevant your system
deems a document to be. This is a floating point value. High
relevance should be expressed with a high value). If a document D1 is
considered more relevant than a document D2, this must be reflected
in the fact that RSV1 > RSV2. If RSV1 = RSV2, the documents may be
randomly reordered during calculation of the evaluation measures.
Please use a decimal point ".", not a comma. Do not use any
form of separators for thousands. The only legal characters for the
RSV values are 0-9 and the decimal point.
MUST BE SORTED IN
DECREASING ORDER PER QUERY.
6) Run identifier (please chose an unique ID for each experiment you submit). Only use a-z, A-Z and 0-9. No special characters, accents, etc.
The
fields are separated by a single space.
The file contains nothing
but lines formatted in the way described above.
You are expected
to retrieve 1000 documents per query. An experiment that retrieves a
maximum of 1000 documents each for 20 queries therefore produces a
file that contains a maximum of 20000 lines.
You
should know that the effectiveness measures used in CLEF evaluate the
performance of systems at various points of recall. Participants must
thus return at most 1000 documents per query in their results. Please
note that by its nature, the average precision measure does not
penalize systems that return extra irrelevant documents at the bottom
of their result lists. Therefore, you will usually want to use the
maximum number of allowable documents in your official submissions.
If you knowingly retrieved less than 1000 documents for a topic,
please take note of that and check your numbers with those reported
by the system during the submission.
You will have to
submit each run through the DIRECT system. An E-mail will be sent to
you explaining how to submit your results.
N.B. Please read the following very carefully
In all of the above tasks, in order to facilitate comparison between results, there must be two mandatory runs: Title + Description (task) with and without using WSD annotations. In addition, you can submit four further baseline runs without WSD and four runs using WSD with in various ways.
The deadline for submission of results for the Robust-WSD task is midnight (24.00) Central European Time, 1 st of June. Detailed information on how and where to submit your results will be communicated in due time.
An input checker program, used by TREC and modified to meet the requirements of CLEF, can be accessed here.
WORKING NOTES
A clear description of the strategy adopted and the resources you used for each run MUST be given in your paper for the Working Notes. The deadline for receipt of these papers is 30 August 2009. The Working Notes will be distributed to all participants on registration at the Corfu Workshop (30 September - 2 October 2009). This information is considered of great importance; the point of the CLEF activity is to give participants the opportunity to compare system performance with respect to variations in approaches and resources. Groups that do not provide such information risk being excluded from future CLEF experiments.
-----------------------------------------------------------------------