Speaker: Mark Stevenson (The University of Sheffield)
Date: May 14, 2010
Time: 16:00
Where: Computer Science Faculty, room 3.17 .
Like text in other domains, biomedical documents contain a range of terms with more than one possible meaning. These ambiguities form a significant obstacle to the automatic processing of these texts. Previous approaches to resolving this problem have made use of a variety of knowledge sources including the context in which the ambiguous term is used and domain-specific resources (such as UMLS). We compare a range of knowledge sources which have been previously used and introduce a novel one: MeSH terms. The best performance is obtained using linguistic features in combination with MeSH terms. Performance exceeds previously reported results on a standard test set.
Our approach is supervised and therefore relies on annotated training examples. A novel approach to automatically acquiring additional training data, based on the relevance feedback technique from Information Retrieval, is presented. Applying this method to generate additional training examples is shown to lead to a further increase in performance.
www.dcs.shef.ac.uk/~marks/
nlp.shef.ac.uk/
[…] Mark Stevenson’s invited talk: ‘Disambiguation of Biomedical Text’ (2010/05/14) […]