Eustagger

Morphosyntactic tagger for Basque

Euskararako etiketatzaile morfosintaktikoa

Descripción (en):

Eustagger is a robust and wide-coverage morphological analyser and a Part-of-Speech tagger for Basque. The analyser is based on the two-level formalism and has been designed in an incremental way with three main modules: the standard analyser, the analyser of linguistic variants, and the analyser without lexicon which can recognize word-forms without having their lemmas in the lexicon. Using lexical transducers for our analyser we have improved both the performance of the different components of the system and the description itself. Provides possible lemmas, PoS and other morphological information for a token. It also recognizes date/time expressions, numbers. The methods we have used in disambiguation are Constraint Grammar formalism and an HMM based tagger. CG rules are applied using all the morphological features and this process decreases morphological ambiguity of texts. Finally, we use the stochastic tool to select just one from the possible remaining tags. Using only the stochastic method the error rate is about 14%, but the accuracy may be increased by about 2% enriching the lexicon with the unknown words. When both methods are combined, the error rate of the whole process is 3.5%.

Descripción:

Eustagger analizatzaile morfosintaktiko sendo eta estaldura zabalekoa da. Analizatzailea bi mailatako formalismoan oinarritzen da eta modu inkrementalean diseinatua izan da, hiru modulu nagusirekin: analizatzaile estandarra, aldaera linguistikoen analizatzailea, eta lexiko gabeko analizatzailea, hitz-formak hauteman ditzakeena lexikoan izan gabe. Transduktore lexikoak erabiliz gure analizatzailearen sistemaren osagai ezberdinen errendimendua eta deskripzioa bera hobetu ditugu. Lema posibleak, PoS eta bestelako informazio morfologikoa ere eskaintzen du. Era berean, data/denbora adierazpenak eta zenbakiak ezagutzen ditu. Desanbiguazioan erabili ditugun metodoak, Murriztapen Gramatika (MG) formalismoa eta HMMan oinarritutako analizatzailea dira. MGko erregelak ezaugarri morfologiko guztiak kontuan hartuz sortu dira eta prozesu honek testuen anbiguotasun morfologikoa gutxitzen du. Azkenik, tresna estokastikoa erabiltzen dugu ematen dituen etiketa-aukeretatik bat hautatzeko. Metodo estokastikoa bakarrik erabiliz, errore-tasa % 14 ingurukoa da, baina lexikoa hitz ezezagunekin aberastuz doitasuna % 2 inguru handitu daiteke. Bi metodoak konbinatzen direnean, prozesu osoaren errore-tasa % 3,5 da.

Bajo demanda

Demo

Enlace para acceder online o descargar:

http://ixa2.si.ehu.eus/demo/analisimorf.jsp

Tipo:

Tools and services

Persona de contacto:

Nerea Ezeiza

Email persona de contacto:

nerea.ezeiza@ehu.eus

Grupo de investigación:

IXA-UPV/EHU

Euskara

Displaying 1 - 2 of 2

Grammars and language models

EDGK

Rule-based Dependency Grammar for Basque

BERTeus

BERT language model for Basque

Displaying 1 - 20 of 20

Tools and services

Averell Averell is a Python library and command line interface to download and to standardize corpora from ten multi-lingual poetry repositories	Jollyjumper Jollyjumper is our enjambment detection Python library for Spanish	Rantanplan Rantanplan is a Python library for the automated scansion of Spanish poetry	PoetryLab app PoetryLab: An Open Source Toolkit for the Analysis of Spanish Poetry Corpora
PDMapping Tool for documenting and analyzing speakers' judgments about spatial and sociocultural linguistic variation.	Ferramenta On-Line de ExpeRimentación PerceptivA (FOLErPa) FOLERPA is an online tool for carrying out perceptual experiments.	Cartografía dos apelidos de Galicia Research tool for the study of the geographical distribution of surnames in Galicia.	Vocabulary analyzer Web Service This web service calculates different lexicometric measures and displays them graphically (tokens, types, hapaxes & type/token ratio).
Ngram Statistics de Pedersen Pedersen's Ngram Statistics Package	UPF Freeling-based part-of-speech tagger. This is the UPF Freeling-based part-of-speech tagger.	Análisis de relaciones de dependencias This WS performs dependency parsing using Bohnet's graph-based Parser. The input is text in plain text or CoNLL format. The languages supported are English and Spanish.	Freeling Named Entity Recognition - NER Freeling-based Named Entity Recognition - NER
WSD-IXA Word-Sense Disambiguation	Ixa pipes Multilingual NLP tools	ixaKat A modular chain of Natural Language Processing tools for Basque	Maltixa Statistical Syntactic analyzer for Basque
Eustagger Morphosyntactic tagger for Basque	Xuxen Spelling and grammar checker for Basque	BASYQUE A web application to analyse syntactic variation of Basque dialects	Analhitza Category analyzer

You are here

Eustagger

Grammars and language models

Tools and services