Erasmus Mundus Master in Language
and Communication Technologies (LCT)

ooo

Language & communication technologies

University of the Basque Country

Building language resources (annotation and evaluation)

The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.

Syllabus

Introduction to language resources
Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...
a. Linguistic issues: conceptual gaps, cultural concepts...
b. Applications: summarization and text simplification
Syntactic-semantic databases and related corpora: Verbnet/PropBank, Nomlex / Nombank, Framenet
a. Language issues: semantic roles, semantic classes, argument structure, lexical entries..
Annotation
a. Word Similarity (WS), Sematic Textual Similarity (STS): Linguistic issues: antonymy, similarity b. Sentimient analysis: Linguistic issues: sentiments, polarity
Corpora evaluation: Intercoder Agreement, R basics

← program

Hizkuntzaren Azterketa eta Prozesamendua

Erasmus Mundus Master in Language and Communication Technologies (LCT)

Building language resources (annotation and evaluation)

Syllabus

Erasmus Mundus Master in Language
and Communication Technologies (LCT)