Building language resources (annotation and evaluation)
The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.
Syllabus
Introduction to language resources
Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...
a. Linguistic issues: conceptual gaps, cultural concepts...
b. Applications: summarization and text simplification
Syntactic-semantic databases and related corpora: Verbnet/PropBank, Nomlex / Nombank, Framenet
a. Language issues: semantic roles, semantic classes, argument
structure, lexical entries..
Annotation
a. Word Similarity (WS), Sematic Textual Similarity (STS): Linguistic issues: antonymy, similarity
b. Sentimient analysis: Linguistic issues: sentiments, polarity
Corpora evaluation: Intercoder Agreement, R basics