Erasmus Mundus Master in Language
and Communication Technologies (LCT)


Introduction

Features

General design

Programme

Calendar

Sholarships

Companies

Institutional website

HAP-LAP

Gallery




            ooo
Language & communication technologies

University of the Basque Country

Building language resources (annotation and evaluation)

The main aim of this course is to show the importance of linguistic resources (databases, knowledge bases and corpora tagged at different linguistic levels) and their appropriate design so that it is possible to learn automatically from them, feed them back with the new information, and evaluate them quantitatively and qualitatively. Besides, the multilingual and interlinguistic issues are emphasized. A practical approach to learn the contents is followed in this course.

Syllabus

  1. Introduction to language resources
  2. Knowledge-bases and related corpora: wordnets, MCR, SemCor, SUMO...
    a. Linguistic issues: conceptual gaps, cultural concepts...
    b. Applications: summarization and text simplification
  3. Syntactic-semantic databases and related corpora: Verbnet/PropBank, Nomlex / Nombank, Framenet
    a. Language issues: semantic roles, semantic classes, argument structure, lexical entries..
  4. Annotation
    a. Word Similarity (WS), Sematic Textual Similarity (STS): Linguistic issues: antonymy, similarity b. Sentimient analysis: Linguistic issues: sentiments, polarity
  5. Corpora evaluation: Intercoder Agreement, R basics


  6. ← program Hizkuntzaren Azterketa eta Prozesamendua