EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory

Sintaktikoki etiketatutako EPEC corpusa (Basque Dependency Treebank)

Descripción (en):

The Basque Dependency Treebank EPEC is the reference corpus for the processing of Basque. EPEC is a 300,000 word corpus of standard written journal texts which aims to be a training corpus for the development and inprovement of several Natural Language Procesing tools. It has been manually tagged at different levels: morphology, partial syntax and semantic.

Descripción:

Euskarazko zuhaitz-bankua edo treebank-a (EPEC-DEP) dependentzia-erlazioetan oinarrituta eskuz sintaktikoki etiketatu den Euskararen Prozesamendurako Erreferentzia Corpusa (EPEC) da. EPEC corpusa euskara estandarrean idatzitako 300.000 hitzek osatzen duten testu-bilduma da. Heren bat XX. mendeko euskararen corpus estatistikotik (www.euskaracorpusa.net) hartu da eta beste bi herenak Euskaldunon Egunkariatik. Hainbat mailatan (morfologia, sintaxi partziala eta semantika) dago etiketatuta eskuzko metodoak nahiz automatikoak baliatuta.
EPEC-DEP treebank-ean, 200.000 hitz etiketatu dira eskuz Dependentzia Gramatikaren Teoria (Tesnière, 1959) jarraituz. Teoria honetan, esaldiko hitzak binaka lotuz esaldiaren zuhaitz sintaktikoa (dependentzia-zuhaitza ere deitua) lortzen da. Zuhaitz hauetan, batetik, adabegietan dauden hitzen arteko gobernatzaile/mendeko erlazioak irudikatzen dira, eta bestetik, bi hitzen arteko loturan mendekoak betetzen duen funtzio sintaktikoa adierazten da dependentzia-etiketen (Aranzabe, 2008) bidez.

Descarga

Enlace para acceder online o descargar:

http://www.ixa.eus/epec-dep-deskarga

Tipo:

Corpora

Persona de contacto:

Maxux Aranzabe

Email persona de contacto:

maxux.aranzabe@ehu.eus

Grupo de investigación:

IXA-UPV/EHU

Euskara

Displaying 1 - 2 of 2

Grammars and language models

EDGK

Rule-based Dependency Grammar for Basque

BERTeus

BERT language model for Basque

Displaying 1 - 20 of 20

Tools and services

Averell Averell is a Python library and command line interface to download and to standardize corpora from ten multi-lingual poetry repositories	Jollyjumper Jollyjumper is our enjambment detection Python library for Spanish	Rantanplan Rantanplan is a Python library for the automated scansion of Spanish poetry	PoetryLab app PoetryLab: An Open Source Toolkit for the Analysis of Spanish Poetry Corpora
PDMapping Tool for documenting and analyzing speakers' judgments about spatial and sociocultural linguistic variation.	Ferramenta On-Line de ExpeRimentación PerceptivA (FOLErPa) FOLERPA is an online tool for carrying out perceptual experiments.	Cartografía dos apelidos de Galicia Research tool for the study of the geographical distribution of surnames in Galicia.	Vocabulary analyzer Web Service This web service calculates different lexicometric measures and displays them graphically (tokens, types, hapaxes & type/token ratio).
Ngram Statistics de Pedersen Pedersen's Ngram Statistics Package	UPF Freeling-based part-of-speech tagger. This is the UPF Freeling-based part-of-speech tagger.	Análisis de relaciones de dependencias This WS performs dependency parsing using Bohnet's graph-based Parser. The input is text in plain text or CoNLL format. The languages supported are English and Spanish.	Freeling Named Entity Recognition - NER Freeling-based Named Entity Recognition - NER
WSD-IXA Word-Sense Disambiguation	Ixa pipes Multilingual NLP tools	ixaKat A modular chain of Natural Language Processing tools for Basque	Maltixa Statistical Syntactic analyzer for Basque
Eustagger Morphosyntactic tagger for Basque	Xuxen Spelling and grammar checker for Basque	BASYQUE A web application to analyse syntactic variation of Basque dialects	Analhitza Category analyzer

You are here

EPEC-DEP (BDT)

Grammars and language models

Tools and services