BIM and SAHCOBA: Building a morphosyntactically annotated Basque historical corpus

Basque in the Making (BIM): A Historical Look at a European Language Isolate and Syntactically Annotated Historical Corpus in Basque (SAHCOBA) are two projects for the construction of a morphosyntactically annotated historical corpus of Basque. This corpus will comprise both part-of-speech and syntactic annotation, and a rich set of metadata structure. Our database will allow us to search the annotated corpus by words, lemmas, grammatical categories, by sequences of grammatical categories, and by specific structural configurations. The BIM project aims to collect the most significant works from the 15th century to the mid 18th century (Archaic and Old Basque), while the SAHCOBA project aims to extend this corpus from the mid 18th century to the mid 20th century (Early and Late Modern Basque) when standard Basque appeared. BIM and SAHCOBA are interdisciplinary projects, where experts on Linguistics and Natural Language Processing take part.


Ainara Estarrona