Description
The QA-WSD task will bring semantic and retrieval evaluation together. The participants will be offered the same queries and document collections from the main QA exercise which have been annotated by systems for word sense disambiguation (WSD). The goal of the task is to test whether WSD can be used beneficially for Question Answering.
The exercise scenario is event-targeted QA on a news document collection. In the QA-WSD track only English monolingual and Spanish to English bilingual tasks are offered, i.e. English will be the only target language, and queries will be available on both English and Spanish. The queries will be the same as for the main QA exercise, and the participation will follow the same process, except for the use of the sense-annotated data.
A large number of questions will be topic-related, i.e. clusters of questions which are related to the same topic and possibly contain anaphoric references between one question and the other questions. Contrary to the main QA task, Wikipedia articles are not included, and thus systems need to reply to the questions that have an answer in the news document collection.
The goal of this task is to evaluate whether word sense information can help in certain queries. For this reason, participants are required to send two runs for each of the monolingual/bilingual tasks where they participate: one which does not use sense annotations and another one which does use sense annotations. Whenever possible, the only difference between the two runs should be solely the use or not of the sense information. Participants which send a single run will be discarded from the evaluation.
The WSD data is based on WordNet version 1.6 and will be supplemented with data from the English and Spanish WordNets in order to test different expansion strategies. Several leading WSD experts will run their systems, and provide those WSD results for the participants to use.
Data formats, additional files, and source of WSD tags
These are the DTDs for the disambiguated queries and documents. Please check these sample disambiguated query (original ) and document (original).
Spanish topics will be disambiguated using the first sense heuristic. English topics and documents will be annotated with the following word sense disambiguation systems:
1. Agirre, Eneko & Lopez de Lacalle, Oier (2007). UBC-ALM: Combining k-NN with SVD for WSD. Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007). pp. 341-345. Prague, Czech Republic.
2. Chan, Yee Seng, & Ng, Hwee Tou, & Zhong, Zhi (2007). NUS-PT: Exploiting Parallel Texts for Word Sense Disambiguation in the English All-Words Tasks. Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval 2007). pp. 253-256. Prague, Czech Republic.
In order to expand from WordNet synset numbers to words in English and Spanish, you will need the following:
- The English WordNet version 1.6 available from here
- The Spanish WordNet, available free for research from here