Guidelines
Also see: http://ixa2.si.ehu.es/clirwsd/
Guidelines for Participation in the CLEF 2008 Ad-Hoc Track: Robust WSD Task
In these Guidelines,
we provide information on the test collections, the tasks,
data
manipulation, query construction and results submission for
the Robust WSD task of
the CLEF 2008 Ad-Hoc track.
Guidelines for
the other CLEF tracks can be found on the dedicated webpages for these
tracks.
MAIN TEST COLLECTION
The
main text
collection in CLEF 2008 has comparable documents for national
newspapers.
In CLEF 2008 the Robust WSD monolingual and
Robust
WSD bilingual
tasks use only the English
(LA
Times '94 and Glasgow Herald
'95) collections.
Topics
are released
using the correct diacritics (according to the language) but may
contain
occasional spelling errors/inconsistencies, minor formatting
deficiencies. We
aim to keep these at a minimum.
TASKS
- monolingual IR (English)Much of the evaluation methodology adopted for these 2 tasks in CLEF is an adaptation of the strategy studied for the TREC ad-hoc task. The instructions given below have been derived from those distributed by TREC. We hope that they are clear and comprehensive. However, please do not hesitate to ask for clarifications or further information if you need it. Send queries to the organizers.
TOPICS3. Only the
following
fields may be used for automatic retrieval:
LA TIMES
1994:
HEADLINE, TEXT only
Glasgow
Herald: HEADLINE, TEXT
only
Learning
from (e.g.
building translation sources from) such fields is
permissible.
GUIDELINES FOR
CONSTRUCTING THE QUERIES
The queries are
constructed from the topics. Each topic consists of three fields: a
brief title
statement; a one-sentence description; a more complex narrative
specifying the
relevance assessment criteria. Queries can consist of 1 or more of
these fields.
There are many possible methods for converting the supplied topics into
queries that your system can execute. We have broadly defined two
generic
methods, "automatic" and "manual", based on whether manual intervention
is used
or not. When more than one set of results are submitted, the different
sets may
correspond to different query construction methods, or if desired, can
be
variants within the same method. Only
automatic runs are allowed in this task.
The manual query construction method
includes BOTH runs in which the queries are constructed manually and
then run
without looking at the results AND runs in which the results are used
to alter
the queries using some manual operation. The distinction is being made
here
between runs in which there is no human involvement (automatic query
construction) and runs in which there is some type of human involvement
(manual
query construction). It is clear that manual runs should be
appropriately
motivated in a CLIR context, e.g. a run where a proficient human simply
translates the topic into the document language(s) is not what most
people think
of as cross-language retrieval.
To further clarify this, here are some
example query construction methodologies, and their correct query
construction
classification. Note that these are only examples; many other methods
may be
used for automatic or manual query construction.
1. queries constructed
automatically from the topics, the retrieval results of these queries
sent to
the CLEF results server --> automatic query construction
2. queries
constructed automatically from the topics, then expanded by a method
that takes
terms automatically from the top 30 documents (no human involved)
-->
automatic query construction
3. queries constructed manually from the
topics, results of these queries sent to the CLEF results server
-->
manual
query construction
4. queries constructed automatically from the topics,
then modified by human selection of terms suggested from the top 30
documents
--> manual query construction
Note that by including all types of
human-involved runs in the manual query construction method we make it
harder to
do comparisons of work within this query construction method. We thus only allow automatic runs.
Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD in various ways.
WHAT TO DO WITH YOUR RESULTS
Your results must be sent to the CLEF results server (address
to be communicated), respecting the submission deadlines (see below). 10.2452/451-AH Q0 document.00072 0 0.017416 runidex1 1 2 3 4 5 6
Results have to be submitted in ASCII format, with one line per
document retrieved.
The lines have to be formatted as follows:
The fields must be separated by ONE blank and have the following meanings:
1) Query identifier.
Please use the complete DOI identifier of the topic (e.g.
10.2452/451-AH, not only 451)
INPUT MUST BE SORTED NUMERICALLY BY QUERY NUMBER.
2) Query iteration (will be ignored. Please choose "Q0" for all experiments).
3) Document number (content of the <DOCNO> tag.).
4) Rank 0..n (0 is best
matching document. If you retrieve 1000 documents per query, rank will
be 0..999, with 0 best and 999 worst). Note that rank starts at 0
(zero) and not 1 (one).
MUST BE SORTED IN INCREASING ORDER PER QUERY.
5) RSV value (system
specific value that expresses how relevant your system deems a document
to be. This is a floating point value. High relevance should be
expressed with a high value). If a document D1 is considered more
relevant than a document D2, this must be reflected in the fact that
RSV1 > RSV2. If RSV1 = RSV2, the documents may be randomly reordered
during calculation of the evaluation measures. Please use a decimal
point ".", not a comma. Do not use any form of separators for
thousands. The only legal characters for the RSV values are 0-9 and the
decimal point.
MUST BE SORTED IN DECREASING ORDER PER QUERY.
6) Run identifier (please chose an unique ID for each experiment you submit). Only use a-z, A-Z and 0-9. No special characters, accents, etc.
The fields are separated
by a single space.
The file contains nothing but lines formatted in the way described
above.
You are expected to retrieve 1000 documents per query. An experiment
that retrieves a maximum of 1000 documents each for 20 queries
therefore produces a file that contains a maximum of 20000 lines.
N.B. Please read the following very carefully
In all of the above tasks, in order to facilitate comparison between results, there must be two mandatory runs: Title + Description (task) with and without using WSD annotations.The absolute deadline for submission
of
results for the ad hoc and the domain-specific tracks is midnight
(24.00)
Central European Time, 19
June. Detailed information on
how
and where to submit your results will be communicated
shortly.
An input checker program, used by TREC
and modified
to meet the
requirements of CLEF, can be accessed here.
WORKING NOTES A clear description of the strategy
adopted and the
resources
you used for each run MUST be given in your paper for the Working
Notes. The
deadline for receipt of these papers is 15 August 2008. The Working Notes
will be distributed to all
participants on registration at the Aarhus CLEF Workshop (17-19
September 2008).
This information is
considered of great
importance; the point of the CLEF activity is to give participants the
opportunity to compare system performance with respect to variations in
approaches and resources. Groups that do not provide such information
risk being
excluded from future CLEF experiments.
------------------------------------------------------------------------
Current
update is 3
March 2008.